home *** CD-ROM | disk | FTP | other *** search
Wrap
Text File | 2000-02-21 | 324.0 KB | 6,903 lines
<html><head><title>Readme for analog 4.03</title></head> <body><h1>Readme for <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog 4.03</a></h1> <a name="Readme"><h2>Introduction</h2> </a> Analog is a program which analyses logfiles from WWW servers. It works on almost any operating system. It is designed to be fast and to produce attractive statistics. It's free software. <p>Beginners should read the <a href="Licence.txt">licence</a> followed by the section on <cite><a href="#start">Starting to use analog</a></cite>. <p>This Readme describes analog 4.03. For the latest version of analog, see the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a>. For examples of the output see <ul> <li><a href="http://www.statslab.cam.ac.uk/~sret1/stats/stats.html">our local statistics.</a> <li><a href="http://www.statslab.cam.ac.uk/~sret1/stats/statsme.html">statistics for my pages.</a> </ul> <p> Analog is free software, but its usage, distribution and modification are covered by a <a href="Licence.txt">licence</a>. You must agree to the terms of the licence before using the program. In particular, it comes with <em>no warranty</em>. <p> This is a version of the Readme in one page. If you're reading it on line, you might prefer the version on <a href="Readme.html">several smaller pages</a>. There is an <a href="#indx">index</a> at the end of this document. <p> Now you can go to <ul> <li><a href="#start">Starting to use analog</a> <li><a href="#custom">Customising analog</a> <li><a href="#meaning">What the results mean</a> <li><a href="#errors">Errors and warnings</a> <li><a href="#faq">Frequently asked questions (FAQ)</a> <li><a href="#mailing">Mailing lists</a> <li><a href="#helpers">Helper applications</a> <li><a href="#acknow">Acknowledgements</a> <li><a href="#whatsnew">What's new in this version?</a> <li><a href="#quickref">Quick reference</a> (for experts) <li><a href="#indx">Index</a> </ul> <hr> <hr> <a name="start"><h2>Starting to use analog</h2> </a> The only thing you need to run analog is to be able to read the logfiles which are produced by your web server. If you don't know what these logfiles are and where to find them, contact your internet service provider (ISP) or system administrator. Analog doesn't write the logfiles: it only reads them. <p> If you log in to your ISP's machine from your home machine, you have two options. If you have the right permissions, you can run analog on your ISP's machine. Otherwise, you can download (e.g., ftp) the logfiles from their machine to yours, and then run analog on your machine. <p> Once you've downloaded the right version of analog for your computer from the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a> (or a mirror site), you need to know how to set it up and run it. This is very easy, but the instructions are slightly different depending which platform you're using. <ul> <li><a href="#startmac">Mac users</a> <li><a href="#startpc">Windows users</a> <li><a href="#startos2">OS/2</a> <li><a href="#startux">Everyone else</a> (Unix, OpenVMS, Acorn, Windows 3.1 etc.) </ul> <p> If you can't manage to set up analog after reading the instructions, send a message to the <a href="#mailing">analog-help mailing list</a>. <hr> <hr> <a name="startmac"><h2>Starting to use analog on a Mac</h2> </a> When you download the Mac version of analog, it should unpack itself. (If it doesn't, you might have to run StuffIt Expander on it). You should then find in the analog directory a configuration file called <kbd>analog.cfg</kbd> and the analog application itself, as well as the Readme, the <a href="Licence.txt">Licence</a> (which you must read and agree to before using analog) and a couple of other files. When you double-click on the analog icon, it will run in its own window, and produce an output file called <kbd>Report.html</kbd>. (For help in interpreting the output, see <cite><a href="#meaning">What the results mean</a></cite>.) The window will then close if there weren't any warning messages, or stay open for you to read them if there were. <hr> You can configure analog by putting commands in the configuration file, <kbd>analog.cfg</kbd>. One command you will need straight away is <pre> LOGFILE logfilename # to set where your logfile lives </pre> The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program. <p> There's a list of <a href="#basiccmd">basic commands</a> later in the Readme. Also there are a few to get you started in the configuration file already, but there are lots of others available. You can read about all the commands in the section on <a href="#custom">customising analog</a>. <hr> Another way to start analog is to drag a logfile onto the analog icon, in which case analog will try to analyse it, or drag a configuration file onto the icon, in which case analog will use the commands in that configuration file. (Analog detects whether it's a configuration file or a logfile by whether it starts with a <kbd>#</kbd> or not.) This enables you to create different reports without having two copies of the application. <p> One note: on other platforms, there is another way to give options, via command line arguments. You'll see these mentioned in this Readme from time to time, but the Mac doesn't have a command line, so ignore these. <p> If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a>. (It's the same source code for all versions). <hr> <hr> <a name="startpc"><h2>Starting to use analog under Windows</h2> </a> This describes how to set up analog under Windows 95, 98 or NT. Windows 3.1 users will have to read the section on <a href="#startux">other platforms</a> instead. <p> When you've downloaded analog, and either you or your browser has unzipped it, you will find in the analog folder a configuration file called <kbd>analog.cfg</kbd> and the analog executable itself, as well as the Readme, the <a href="Licence.txt">Licence</a> (which you must read and agree to before using analog) and a couple of other files. There is no <kbd>setup.exe</kbd>: analog is already ready to run without one. <p> (Some unzip programs are broken, and do not create folders when they should. If you don't have a folder called <kbd>lang</kbd> inside the analog folder, create one and put all the files called <kbd>*.lng</kbd> and <kbd>*.tab</kbd> into it.) <p> There are two ways of running analog. You can either run it from Windows (by single-clicking or double-clicking on its icon, depending on your setup), or you can run it from the DOS command prompt (under Start-Programs). If you run it from Windows, it will create a DOS window to run in. When it's finished, it will produce an output file called <kbd>Report.html</kbd>. The first time you run it, this may all happen almost instantly. For help in interpreting the output, see <cite><a href="#meaning">What the results mean</a></cite>. <hr> You can configure analog by putting commands in the configuration file, <kbd>analog.cfg</kbd>. One command you will need straight away is <pre> LOGFILE logfilename # to set where your logfile lives </pre> The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program. <p> There's a list of <a href="#basiccmd">basic commands</a> later in the Readme. Also there are a few to get you started in the configuration file already, but there are lots of others available. You can read about all the commands in the section on <a href="#custom">customising analog</a>. <p> In some ways, it's easier to run analog from the DOS command prompt, because you get to see any error or warning messages more easily. Also, if you run analog from the command prompt, there is another way to give options, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands. You can use the command line arguments if you run analog from a batch file too. <p> If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a>. (It's the same source code for all versions). <hr> <hr> <a name="startos2"><h2>Starting to use analog under OS/2</h2> </a> When you've downloaded analog, and either you or your browser has unzipped it, you will find in the analog directory a configuration file called <kbd>analog.cfg</kbd> and the analog executable itself, as well as the Readme, the <a href="Licence.txt">Licence</a> (which you must read and agree to before using analog) and a couple of other files. You can run analog by just typing <kbd>analog</kbd>. It should produce an output file called <kbd>Report.html</kbd>. For help in interpreting the output, see <cite><a href="#meaning">What the results mean</a></cite>. <hr> You can configure analog by putting commands in the configuration file, <kbd>analog.cfg</kbd>. One command you will need straight away is <pre> LOGFILE logfilename # to set where your logfile lives </pre> You need to use <kbd>\</kbd> not <kbd>/</kbd> as the directory separator in the logfile name. The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program. <p> There's a list of <a href="#basiccmd">basic commands</a> later in the Readme. Also there are a few to get you started in the configuration file already, but there are lots of others available. You can read about all the commands in the section on <a href="#custom">customising analog</a>. <p> There is one other way to give options to analog, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands. <p> If you want to compile your own version of analog (it's written in C), or just to read the source code, it's available from the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a>. (It's the same source code for all versions). There are <a href="#compileOS2">instructions about compiling</a> on another page. <hr> <hr> <a name="startux"><h2>Starting to use analog on other platforms</h2> </a> If you're not using one of the platforms for which a precompiled version of analog is available, you'll have to compile your own version from the source. But don't worry -- it's written in standard C throughout, so it will compile out of the box on most platforms. (The source code is the same for all platforms.) <p> First, you should look at the file anlghead.h, and see if there's anything you want to edit. In particular, you need to set the <kbd>ANALOGDIR</kbd>. <p> When you have done that, you need to compile the program. How to do that depends on which operating system you're using. <hr> <b><a name="compileux">Compiling under Unix</a></b>. First edit anlghead.h as described above. Then just type <pre> make </pre> to compile the program. On most systems, that will be sufficient. If it fails to compile, have a look in the Makefile to see if there's anything that you need to change to suit your configuration, and try again. It says in that file what to do. In particular, <b>Solaris 2 (SunOS 5)</b> users need to change the <kbd>LIBS=</kbd> line. <p> (Experts can pass some arguments in on the <kbd>make</kbd> command line instead of by editing <kbd>anlghead.h</kbd>: e.g. <pre> make DEFS='-DANALOGDIR=\"/usr/etc/apache/analog/\"' </pre> This is useful if you have a script to compile analog.) <p> If you haven't got gcc, you will need to change the compiler - try acc or cc instead. If it still doesn't compile, try <kbd>DEFS=-DNODNS</kbd> to ignore the DNS lookup code. <p> There is a known problem with <b>HP-UX 10</b> and some versions of gcc. If it complains about an error in the <kbd><sys/stat.h></kbd> library, you need to upgrade to gcc version 2.7.2.3 or later, or use HP's cc compiler. HP's compiler is not an ANSI C compiler by default, so you need to specify <kbd>-Ae</kbd> in the <kbd>CFLAGS</kbd> to tell the compiler to use ANSI C. <p> <b>SunOS 4</b>'s cc and gcc don't have the necessary header files for ANSI C. If you have the ANSI C compiler acc, use that. Otherwise use the <kbd>DEFS</kbd> given in the Makefile. <p> <b>SunOS 5</b> users need to change the <kbd>LIBS=</kbd> line in the Makefile. Also, this OS sometimes seems to have a broken <kbd>strcmp()</kbd> function. If you get an "illegal instruction" error when running analog, compile it with the <kbd>-DNEED_STRCMP</kbd> in the <kbd>DEFS=</kbd> line. <p> <b><a name="compileVMS">Compiling under OpenVMS</a></b>. First edit anlghead.h as described above. Then type <pre> MMS </pre> to compile analog. <p> <b><a name="compileRiscOS">Compiling under Acorn RiscOS</a></b>. The Makefile is called <kbd>Make.Risc</kbd>, and you will have to rename it to <kbd>Makefile</kbd> before running make. Also you have to make directories called <kbd>C</kbd>, <kbd>H</kbd> and <kbd>O</kbd>, and move the sources files into the appropriate directories: e.g., <kbd>alias.c</kbd> must be renamed <kbd>C.alias</kbd>. And you will find that there are some filenames in the header file <kbd>anlghead.h</kbd> that you want to change to fit into the RiscOS directory structure. <p> <b><a name="compileOS2">Compiling under OS/2</a></b>. Although there is a precompiled version of analog for OS/2, if you want to compile your own you will need the <a href="ftp://hobbes.nmsu.edu/pub/os2/dev/emx/">EMX package</a>. You should edit the Makefile to have <kbd>OS=OS2</kbd> and <kbd>LIBS=-lsocket</kbd>. Then after editing anlghead.h and running Make, you need to run the command <pre> EMXBIND -b ANALOG </pre> to generate the analog.exe executable. <hr> After you've made the program, just type <pre> analog </pre> to run the program. (Or <kbd>./analog</kbd> if for some reason <kbd>.</kbd> isn't in your <kbd>$PATH</kbd>.) <p> You can configure analog by putting commands in the configuration file, which is called <kbd>analog.cfg</kbd> by default. Two commands you will need straight away are <pre> LOGFILE logfilename # to set where your logfile lives OUTFILE outputfile.html # to send the output to a file instead of the screen </pre> The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet. There's a sample logfile supplied with the program. <p> There's a list of <a href="#basiccmd">basic commands</a> later in the Readme. Also there are a few to get you started in the configuration file already, but there are lots of others available. You can read about all the commands in the section on <a href="#custom">customising analog</a>. For help in interpreting the output, see <cite><a href="#meaning">What the results mean</a></cite>. <p> There is one other way to give options to analog, via command line arguments, given on the command line after the program name. These are just shortcuts for configuration file commands. <hr> <hr> <a name="custom"><h2>Customising analog</h2> </a> This section is the bulk of the Readme. It tells you all the commands you can give to analog, and what they all do. First there's a list of <ul> <li><a href="#basiccmd">basic commands</a> </ul> which is as much as beginners need to read, until they want to do something which isn't listed there, or are curious to find out what they could do. <p> The following section is a technical (i.e., dull but important) section on the <ul> <li><a href="#syntax">syntax of configuration commands</a>. </ul> Then there's documentation on all the configuration commands in the following categories. Analog has over 200 configuration commands and over 40 command line options, so sometimes these sections turn into lists of commands. But here's where you find out everything you can do with analog. <p> Later there's an <a href="#indx">index</a> of all the commands and topics, and also a <a href="#quickref">quick reference</a> containing the syntax of all the commands and examples. <ul> <li><a href="#logfile">Choosing a logfile</a> <li><a href="#logfmt">Specifying a log format</a> <li><a href="#alias">Aliases</a> <li><a href="#include">Inclusions and exclusions</a> <li><a href="#args">Search arguments</a> <li><a href="#output">Configuring the output</a> <li><a href="#timereps">Time reports</a> <li><a href="#othreps">Other reports</a> <li><a href="#hierreps">Hierarchical reports</a> <li><a href="#domfile">The domains file</a> <li><a href="#compout">Computer-readable output</a> <li><a href="#cache">Cache files</a> <li><a href="#dns">DNS lookups</a> <li><a href="#lowmem">Coping with low memory</a> <li><a href="#debug">Debugging</a> <li><a href="#form">Form interface and CGI program</a> </ul> <hr> <hr> <a name="basiccmd"><h2>Basic commands</h2> </a> Here is a list of basic configuration commands to get you started with analog. These commands should be added to your configuration file, <kbd>analog.cfg</kbd>, as explained in the section on <cite><a href="#start">Starting to use analog</a></cite>. We'll see all the possible configuration commands in later sections. Or you can read a summary of the commands which control each report in the section on <cite><a href="#reports">Analog's reports</a></cite>. <hr> Analog reads logfiles produced by your web server, and produces an output file based on the data in them. So you need to know how to specify which logfile to read, and which file to send the output to. The relevant commands look like <pre> LOGFILE my_logfile OUTFILE output.html </pre> where, of course, you should substitute the names of the files you want to use. The logfile must be stored locally -- analog won't use FTP or HTTP to fetch it from the internet, so you may have to fetch it yourself first. You can read several logfiles by giving several logfile commands, or by giving a comma-separated list, or by using wild cards in the logfile name. So, for example, if you use the commands <pre> LOGFILE new1.log,old*.log LOGFILE new2.log </pre> analog will analyse the logfiles <kbd>new1.log</kbd>, <kbd>new2.log</kbd>, and all the old logfiles. Analog will recognise logfiles in several different formats. You can read more about this in the section on <cite><a href="#logfile">Choosing a logfile</a></cite>. <hr> There are a couple of other commands you need to know right at the beginning, not because they're particularly important in themselves, but because the output will look silly if you don't know them. First, you need to know how to put your own organisation's name and URL at the top of the report. For this, you need two commands such as <pre> HOSTNAME "Spam Widgets Inc." HOSTURL http://www.spam-widgets.com/ </pre> <p> If you have broken images in the output instead of graphs, you need to say in which directory on your server the images are stored. You do this by a command like <pre> IMAGEDIR /analog/images/ </pre> (The images are distributed with the program - you will have to move them to whichever directory you choose.) <hr> Next you will want to know how to turn individual reports on and off. Analog can produce 32 different reports, but here are the most important. Try them and see what happens. You can turn each report on with an <kbd>ON</kbd> command, or off with an <kbd>OFF</kbd> command. You can also use the commands <kbd>ALL ON</kbd> and <kbd>ALL OFF</kbd> to turn all reports on or off. <pre> MONTHLY ON # one line for each month WEEKLY ON # one line for each week FULLDAILY ON # one line for each day DAILY ON # one line for each day of the week HOURLY ON # one line for each hour of the day GENERAL ON # the General Summary at the top REQUEST ON # which files were requested FAILURE ON # which files were not found DIRECTORY ON # Directory Report HOST ON # which computers requested files ORGANISATION ON # which organisations they were from DOMAIN ON # which countries they were in REFERRER ON # where people followed links from FAILREF ON # where people followed broken links from SEARCHQUERY ON # the phrases and words they used... SEARCHWORD ON # ...to find you from search engines BROWSER ON # which browsers people were using OSREP ON # and which operating systems FILETYPE ON # types of file requested SIZE ON # sizes of files requested STATUS ON # number of each type of success and failure </pre> The referrer and browser reports will only appear if your server records the necessary information. You can configure lots of other things about each report, such as how many rows are listed, which columns are included, and how the reports are sorted. For example, the command <pre> REQINCLUDE pages </pre> tells analog only to list pages, rather than all files, in the request report. You can read a summary of all the reports and the commands which control them in the section on <cite><a href="#reports">Analog's reports</a></cite>. <hr> You can have the output in several different languages, by using a <kbd>LANGUAGE</kbd> command. For example, the command <pre> LANGUAGE FRENCH </pre> will give you the output in French. The available languages at the moment are <kbd>ARMENIAN</kbd>, <kbd>BOSNIAN</kbd>, <kbd>CATALAN</kbd>, <kbd>SIMP-CHINESE</kbd> (GB2312 encoding), <kbd>TRAD-CHINESE</kbd> (Big5 encoding), <kbd>CZECH</kbd>, <kbd>DANISH</kbd>, <kbd>DUTCH</kbd>, <kbd>ENGLISH</kbd>, <kbd>US-ENGLISH</kbd>, <kbd>FINNISH</kbd>, <kbd>FRENCH</kbd>, <kbd>GERMAN</kbd>, <kbd>GREEK</kbd>, <kbd>ITALIAN</kbd>, <kbd>JAPANESE</kbd>, <kbd>NORWEGIAN</kbd> (Bokmål), <kbd>NYNORSK</kbd>, <kbd>POLISH</kbd>, <kbd>PORTUGUESE</kbd>, <kbd>BR-PORTUGUESE</kbd>, <kbd>RUSSIAN</kbd>, <kbd>SERBIAN</kbd>, <kbd>SLOVAK</kbd>, <kbd>SLOVENE</kbd>, <kbd>SPANISH</kbd>, <kbd>SWEDISH</kbd>, <kbd>TURKISH</kbd> and <kbd>UKRAINIAN</kbd>. See the section on <cite><a href="#LANGUAGE">Configuring the output</a></cite> for how to download, or even translate, new languages. <p><i>Note: The following additional languages were available in version 3 of analog: <kbd>HUNGARIAN</kbd>, <kbd>ICELANDIC</kbd>, <kbd>KOREAN</kbd>, <kbd>LATVIAN</kbd>, <kbd>LITHUANIAN</kbd> and <kbd>ROMANIAN</kbd>. I hope that they will be available for this version soon. As they are translated, they will be added to the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a>. Version 3 of analog will also be available at the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a> for a while, if you need one of these languages.</i> <hr> Two other common things you might want to do are to <i>alias</i> files or hosts (for example, to tell analog that two different filenames are really the same file), or to <i>include</i> or <i>exclude</i> certain files, hosts or dates (to ignore accesses from your site, for example, or to do an analysis only of a certain subdirectory or a certain time period. For these, see the later sections on <cite><a href="#alias">Aliases</a></cite> and <cite><a href="#include">Inclusions and exclusions</a></cite>. <p> As I said, these are only a few of the commands available. To find out about all the commands, you'll have to read the remaining sections of the Readme, starting with a short section on the <a href="#syntax">syntax of configuration commands</a>. <hr> <hr> <a name="syntax"><h2>Syntax of configuration commands</h2> </a> This section describes how analog finds configuration commands, and what the syntax of a configuration file should be. The syntax of individual commands is given in the <cite><a href="#quickref">Quick reference</a></cite> section later. <hr> When analog starts up, it first reads options from configuration files and the command line (assuming that you are running analog from an operating system with a command line). Defaults for many of these options will have already been set in the files <kbd>anlghead.h</kbd> and <kbd>anlghea2.h</kbd> at the time the program was compiled. So if you compile your own version of analog, rather than downloading a pre-compiled executable, you can also set some options in those files before compiling. Those options are all documented there. <hr> <a name="specialcfgs">The first file</a> which analog reads is the <i>default configuration file</i>, normally called <kbd>analog.cfg</kbd>. You can stop this file being read by specifying the option <kbd>-G</kbd> on the command line. Then the command line arguments are read, in the order in which they appear. Finally, the <i>mandatory configuration file</i> is read, if you specified one when you compiled the program. This is a configuration file which cannot be overridden by the user: if it is not found, analog exits immediately. This allows a system administrator to prevent users analysing certain files or producing certain reports, for example. <i><strong>However</strong></i>, note that the only certain way to prevent users analysing things is to deny them access to the logfile. Otherwise there is nothing to stop them analysing the logfile using another copy of analog or another program. <hr> <a name="CONFIGFILE">You can include</a> another configuration file from the command line by using a command like <kbd>+gother.cfg</kbd>. (Note that there is no space between <kbd>+g</kbd> and the filename; this is true of all command line arguments.) You can also include another configuration file from within a configuration file by a command like <pre> CONFIGFILE other.cfg </pre> The commands in the other configuration file are read immediately, in order. The program then continues reading the command line or calling configuration file where it left off. Note that reading an alternative configuration file does <b>not</b> stop the default configuration file (usually <kbd>analog.cfg</kbd>) being read as well. To do that you have to specify <kbd>-G</kbd> as well as the <kbd>+g</kbd> command. Also, note that reading in several configuration files does <b>not</b> produce several reports, but a single report based on all the options. <p> In the Mac version, you can start up a program with a particular configuration file instead of the default one by dragging the configuration file onto the analog icon. The file must start with a <kbd>#</kbd>. <p> <a name="plusC">You can also</a> specify any configuration command on the command line even if it doesn't have a command line abbreviation, by use of the <kbd>+C</kbd> command. For example, <kbd>+C"UNCOMPRESS *.gz gzcat"</kbd> will include that command. <hr> <a name="commandsyntax">Here are the syntax rules</a> for configuration commands. A configuration file contains several commands on separate lines; any text after a hash (<kbd>#</kbd>) on a line is ignored as a comment. Each command consists of the command name followed by one or two arguments. An argument to a command may optionally be placed in single or double quotes or parentheses, and it must be if the argument contains a hash or a space. So, for example, here are some valid configuration commands. <pre> DAILY OFF # We don't want a daily summary FULLDAILY "ON" # We want a full daily report instead HOSTNAME (Spam Widgets Inc.) # Spaces, so quotes or brackets needed </pre> Generally later commands override earlier ones if you can have only one of that thing (e.g., for the <kbd>OUTFILE</kbd>), or supplement them if you can have several (e.g., for the <kbd>LOGFILE</kbd>, because you can read several logfiles). Apart from that, the order of commands doesn't matter, except that <kbd><a href="#logfmt">LOGFORMAT</a></kbd> and <kbd><a href="#TIMEOFFSET">LOGTIMEOFFSET</a></kbd> commands must come earlier in the same configuration file than the <kbd>LOGFILE</kbd> to which they refer. <hr> <a name="settings">If all the options</a> seem a bit confusing, just run <pre> analog -settings [other options] </pre> or include <kbd>SETTINGS ON</kbd> in the configuration commands. That will tell you what the values of all the variables will be, based on the defaults in <kbd>anlghead.h</kbd> and <kbd>anlghea2.h</kbd>, the configuration commands, and the command line options. If you're on Unix or Windows, remember that you can send the output to a file with <pre> analog -settings > file </pre> <hr> <hr> <a name="logfile"><h2>Choosing a logfile</h2> </a> The basic command for selecting a logfile is <pre> LOGFILE logfilename </pre> or just to put the logfile name on the command line without any arguments, e.g., <kbd>analog logfilename</kbd>. A <kbd>-</kbd> sign or the word <kbd>stdin</kbd> is interpreted as standard input: this is useful on Unix systems for constructing pipes. All logfiles must be within your computer's file system (on disk, or at least mounted under Unix, or on a mapped drive under NT) -- analog won't use FTP or HTTP to fetch them from the internet. In the Mac version, you can also analyse a particular single logfile by dragging it onto the analog icon. <p> You can have several <kbd>LOGFILE</kbd> commands. You can include wildcards in the logfile name (but not necessarily in the directory name: this is system-dependent), and you can use a list of logfiles separated by commas (without spaces). So the following commands would tell analog to read <kbd>logfile1</kbd>, <kbd>c:\logs\logfile2</kbd>, and all files ending in <kbd>.log</kbd>: <pre> LOGFILE logfile1,*.log LOGFILE c:\logs\logfile2 </pre> Or if you were on a Mac, you might use something like <pre> LOGFILE "Hard Drive:Internet Applications:Analog:Logs:*" </pre> The <kbd>LOGFILE</kbd> commands are cumulative, except that any logfiles on the command line or in user-specified configuration files override any in the <a href="#specialcfgs">default configuration file</a>, and are themselves overridden by any in the <a href="#specialcfgs">mandatory configuration file</a>. There is also the special command <pre> LOGFILE none </pre> which erases the list of logfiles specified so far. <hr> Analog knows about several different types of logfile. By default it will attempt to see if your logfile is of one of the types it knows about, based on the first line. The types it can usually diagnose are the common log format, the NCSA combined format, referrer log and browser log, the W3 extended log format, the Microsoft IIS format, the Netscape format, the WebSTAR format and the WebSite format. <a href="#formats">Examples of all these formats</a> are given at the end of this section. If you have <a href="#debugs">debugging</a> on, analog will report what type of logfile it thinks yours is. <p> If your logfile is not in one of the standard formats, you will probably still be OK, because it is possible to tell analog about other formats using a <kbd>LOGFORMAT</kbd> command. This is explained in the <a href="#logfmt">next section</a>. But most users don't ever need to know about this because they have logfiles in a standard format. So the best thing to do is just to try analysing your logfile and see if analog will understand it. If it does, you don't need to worry about <kbd>LOGFORMAT</kbd>s. <p> <a name="corruptlines">If analog can't understand</a> your logfile, it will warn you that it can't detect the format, or possibly that it found a lot of corrupt lines. There are basically four reasons why this might happen: <ol> <li>Some log formats are not very well designed and analog can't analyse them reliably. In this case it will give up, usually with a helpful message, rather than risk doing a bad job. For example, you might get "<em>Logfile with ambiguous dates</em>" or "<em>Time without date</em>." In this case you should read the <a href="#formats">notes on all the built-in formats</a> below where some common problems with those formats are described. <li>Since analog tries to deduce the format based on the first line of the logfile, it could just be that the first line is corrupt. In this case, you could tell analog the format, or you could just fix the first line. <li>For the same reason, if the format changes midway through the log, analog will count the remaining lines as corrupt. In this case, you will find that your report contains a partial analysis but with a large number of corrupt lines too. You will need to give analog two <kbd>LOGFORMAT</kbd> commands to tell it about the two different formats. <li>Finally, some logfiles really aren't in one of the standard formats. In this case you will need to <a href="#logfmt">read the next section</a> and learn how to tell analog about your format. </ol> <hr> <a name="secondarg">There's also a second argument</a> to the logfile command, which specifies a prefix to add to all the filenames in that logfile. This is useful if you've got several different servers or virtual hosts, when the same filename may occur on each of the servers. The argument can contain a <kbd>%v</kbd>, and the name of the virtual host will then be inserted at that point. For example, <pre> LOGFILE log1,log2 http://www.%v.mydomain.com </pre> would translate a filename <kbd>/file.html</kbd> with virtual host <kbd>host1</kbd> in <kbd>log1</kbd> or <kbd>log2</kbd> to <kbd>http://www.host1.mydomain.com/file.html</kbd>. If you are using the second argument to the <kbd>LOGFILE</kbd> command, you will probably want to use the <kbd><a href="#hierreps">SUBDIR</a></kbd> command as well. <p> If <kbd>%v</kbd> is included in the argument and the logfile line doesn't have a virtual host, that line will be marked as corrupt. If <kbd><a href="#lowmem">VHOSTLOWMEM 3</a></kbd> is specified, the <kbd>%v</kbd>'s will not be translated and will just appear as <kbd>%v</kbd> in the output. <hr> <a name="UNCOMPRESS">It is often convenient</a> to store logfiles compressed to save disk space. Analog on the Mac can read logfiles compressed using gzip. And analog on Unix and Win32 can read compressed logfiles provided that you use an <kbd>UNCOMPRESS</kbd> command to say how to uncompress them. You need to supply the types of file that you want to uncompress in a comma-separated list, together with the name of a command that will uncompress the files to standard output (rather than to a file). For example, on Unix you might use <pre> UNCOMPRESS *.gz,*.Z /usr/bin/gzcat </pre> whereas on Windows NT, you might use <pre> UNCOMPRESS *.gz ("c:\Program Files\gzip\gzip" -cd) </pre> This would be a suitable command to include in the <a href="#specialcfgs">default configuration file</a>. <p> If analog determines when it starts to uncompress a logfile that that file isn't wanted for the analysis, two undesirable things can happen. Either the program might pause until the logfile is fully uncompressed, or there might be a "broken pipe" error reported. This is system dependent, and out of analog's control. <hr> <h3><a name="formats">Logfile</a> formats</h3> Here is a summary of the various logfile formats which analog knows about. To illustrate them, I have used the same (fictional) request as it might be recorded in the different formats. <p> <a name="commonfmt">The common</a> logfile format is written by most servers. Its lines look like <pre> jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243 </pre> (except all on one line). Some versions of Microsoft software have a buggy version of this with an extra quote mark before the <kbd>HTTP</kbd> like this: <pre> jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ "HTTP/1.0" 200 1243 </pre> Analog will understand these, but (as with any two formats) it will reject lines if the format changes half way through. <hr> <a name="reffmt">The NCSA referrer log</a> looks like <pre> [25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/ </pre> <a name="browfmt">and the browser (or agent) log</a> looks like <pre> [25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05) </pre> In the referrer log, the date can be omitted. <hr> <a name="combinedfmt">The NCSA combined log</a> is the same as the common log, except that it has the referrer and browser on the end in quotes, like this: <pre> jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)" </pre> (except all one line). If you are using the Apache server, you can generate this with the <kbd>mod_log_config</kbd> module, using the command <pre> LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" </pre> It is usually better to use the combined log than separate logs, because it stores more information in less space. <hr> <a name="IISfmt">The Microsoft IIS logfile</a> looks like <pre> 192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10, 2178, 303, 1243, 200, 0, GET, /~sret1/, -, </pre> (except all on one line). However, the format is extremely badly designed, in that the date follows local conventions: in other words, in North America the above example would have the date <kbd>12/25/98</kbd> instead. Analog will diagnose which form the logfile is in if possible: but if both the date and the month are at most 12, there is no way to tell which format it is. In this case, it will advise you to use the command <kbd>LOGFORMAT MICROSOFT-NA</kbd> for North American date format, or <kbd>LOGFORMAT MICROSOFT-INT</kbd> for international date format. In some countries, the date will not be in either of these formats, in which case you need to write your own <kbd>LOGFORMAT</kbd> command. <p> There are also various third-party extensions to the Microsoft format to include, for example, the browser and referrer. But they all do it in different ways, so analog can't automatically diagnose them, and again, you need to write a <kbd>LOGFORMAT</kbd> command for them. <hr> <a name="websitefmt">The WebSite format</a> looks like <pre> 12/25/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/ http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178 </pre> (except all on one line, and with the fields separated by tabs). It suffers from the same problem with ambiguous dates as the IIS logfile (above), so again you might have to use <kbd>LOGFORMAT WEBSITE-NA</kbd> or <kbd>LOGFORMAT WEBSITE-INT</kbd>, or even have to write your own <kbd>LOGFORMAT</kbd> command. <hr> <a name="restfmts">The W3 extended log</a>, the Netscape log, and the WebSTAR log can be recognised because they <b>must</b> include at or near the top a line telling analog what format to expect on subsequent lines. (They may also contain later lines changing the format). If the header line is missing, analog won't be able to interpret the subsequent lines and so won't be able to analyse the logfile. In this case, you will have to either replace the missing header or use a <kbd>LOGFORMAT</kbd> command to tell analog your format. <p> If analog finds that the header line is corrupt, it will usually tell you what was wrong with it. Here are two common problems. First, the header line musn't contain the same item twice, even under two different names. (This is because analog doesn't know which one you want to use.) If it does contain the same item twice, you will have to use a <kbd>LOGFORMAT</kbd> command to tell analog which one you want to ignore. <p> <a name="dateonly">Secondly</a>, you're not allowed the time without the date or vice versa -- in particular, having the date just at the top of the logfile is not sufficient; you must have it on each line. Microsoft servers produce extended logs with the date only at the top. But if the date changes during the logfile, the server doesn't then write a new date line. For this reason analog can't analyse such logfiles safely. There are some programs on the <a href="#helpers">helper applications page</a> to put the date on each line. If you already have such a logfile you might want to use one of these programs, but they have to assume that the date doesn't change during the logfile, so it would be safer to tell your server to log in a better format in future. <p> <a name="extendedfmt">The extended log</a> is described at <a href="http://www.w3.org/TR/WD-logfile.html">http://www.w3.org/TR/WD-logfile.html</a>. Its header line looks like <pre> #Fields: date time cs-uri </pre> In the rest of the logfile, the fields can be separated by spaces or tabs. There is also Microsoft's attempt at the extended format -- unfortunately they didn't read the spec., so they didn't enclose the browser and referrer in quotes, they replaced spaces in the browser name with <kbd>+</kbd>'s, and they put the time taken to serve the request in milliseconds instead of seconds. Extended logs always record the time in GMT, so you will probably need to use a <kbd><a href="#TIMEOFFSET">LOGTIMEOFFSET</a></kbd> command to convert to your local timezone. <p> <a name="webstarfmt">The WebSTAR format</a> is described at <a href="http://www.starnine.com/webstar/docs/ws4manual.3f.html">http://www.starnine.com/webstar/docs/ws4manual.3f.html</a>. It has a header line like <pre> !!LOG_FORMAT DATE TIME RESULT URL BYTES_SENT HOSTNAME </pre> In the rest of the logfile, the fields are separated by tabs. The WebSTAR server also records the time in GMT, so again you will probably need to use a <kbd><a href="#TIMEOFFSET">LOGTIMEOFFSET</a></kbd> command to convert to your local timezone. Some other Mac servers also use the WebSTAR format, or something looking like it. Analog will understand these too. <p> <a name="netscapefmt">Finally, the Netscape</a> header line looks like <pre> format=%Ses->client.ip% [%SYSDATE%] "%Req->reqpb.clf-request%" %Req->srvhdrs.clf-status% %Req->srvhdrs.content-length% </pre> <hr> <hr> <a name="logfmt"><h2>Specifying a log format</h2> </a> This section is about how to tell analog the format of your logfile. I'll assume that you've read the <a href="#logfile">previous section</a>, and have decided that you need to specify the log format explicitly, because analog can't detect the format of your logfile itself for some reason. <p> The basic command to specify a log format looks like <pre> LOGFORMAT format </pre> -- we'll discuss what the formats can be in a minute. Or if you are using the Apache server, you will probably find it more convenient to use <pre> APACHELOGFORMAT format </pre> instead. <p> The <kbd>LOGFORMAT</kbd> and <kbd>APACHELOGFORMAT</kbd> commands only apply to logfiles specified with a <kbd>LOGFILE</kbd> command <em>later</em> in the <em>same</em> configuration file. So you must put the <kbd>LOGFORMAT</kbd> above the <kbd>LOGFILE</kbd> to which it refers. This way, different logfiles can have different formats, like this: <pre> LOGFILE log0 LOGFORMAT format1 LOGFILE log1 LOGFORMAT format2 LOGFILE log2 LOGFILE log3 </pre> In this example, <kbd>log1</kbd> is in <kbd>format1</kbd>, <kbd>log2</kbd> and <kbd>log3</kbd> are in <kbd>format2</kbd>, and <kbd>log0</kbd> isn't in either format -- analog will try and detect which format it's in. <hr> <a name="Apache">The <kbd>APACHELOGFORMAT</kbd> command</a> is followed by the <kbd>LogFormat</kbd> from your Apache <kbd>httpd.conf</kbd> file. For example, common format could be represented by <pre> APACHELOGFORMAT (%h %l %u %t \"%r\" %s %b) </pre> (The parentheses are needed because the argument contains spaces.) Analog understands all Apache log formats, with the exception that it won't parse Apache's <kbd>"%...{format}t"</kbd> construction for customised times: if you have this construction, you will have to use ordinary <kbd>LOGFORMAT</kbd> instead. <hr> <a name="fmtsyntax">The possible formats</a> for use with the <kbd>LOGFORMAT</kbd> command are of two types. First there are some symbolic words, and then there are <i>log format strings</i>. We'll look at the words first. <p> <a name="fmtwords">There are format words</a> for all the built-in formats analog knows about. You might need one of these words if your logfile is in a standard format, but analog can't detect which format it's in for some reason; for example, maybe the first line is corrupt; or maybe analog can't tell whether you're using North American or international dates. So for example <pre> LOGFORMAT COMMON </pre> will select common format; you can also have <kbd>COMBINED</kbd>, <kbd>REFERRER</kbd>, <kbd>BROWSER</kbd>, <kbd>EXTENDED</kbd>, <kbd>MICROSOFT-NA</kbd> (North American date format), <kbd>MICROSOFT-INT</kbd> (international date format), <kbd>WEBSITE-NA</kbd>, <kbd>WEBSITE-INT</kbd>, <kbd>MS-EXTENDED</kbd> (Microsoft's attempt at extended format), <kbd>MS-COMMON</kbd> (a buggy version of common format in some versions of Microsoft software), <kbd>NETSCAPE</kbd> or <kbd>WEBSTAR</kbd>. All these formats were defined at the end of the <a href="#formats">previous section</a>. You can also use the special word <kbd>AUTO</kbd> to return to automatic detection. <p> <a name="fmtstrings">If your logfile</a> is not in one of the recognised formats, you can tell analog about your format using a log format string. You only ever need this if your logfile has lines which are not in one of the standard formats. (And even if it isn't in a standard format, if you're using the Apache web server, you will find <kbd><a href="#Apache">APACHELOGFORMAT</a></kbd> easier.) <p> The format string consists of a template for the logfile line, with the various fields and special characters replaced by codes as follows. Please note that these codes are case sensitive -- for example, <kbd>%b</kbd> is completely different from <kbd>%B</kbd>! <dl compact> <dt><kbd>%S</kbd><dd>host (computer making the request) <dt><kbd>%r</kbd><dd>file requested <dt><kbd>%B</kbd><dd>browser <dt><kbd>%A</kbd><dd>browser with <kbd>+</kbd>'s instead of spaces <dt><kbd>%f</kbd><dd>referrer (URL referring to the file) <dt><kbd>%u</kbd><dd>user (tip: a cookie can usefully be defined as <kbd>%u</kbd> too) <dt><kbd>%v</kbd><dd>virtual host (also called virtual domain) <dt><kbd>%d</kbd><dd>day of the month <dt><kbd>%m</kbd><dd>month in digits <dt><kbd>%M</kbd><dd>month, three letter English abbreviation <dt><kbd>%y</kbd><dd>year, last two digits <dt><kbd>%Y</kbd><dd>year, four digits <dt><kbd>%h</kbd><dd>hour of the day <dt><kbd>%n</kbd><dd>minute of the hour <dt><kbd>%a</kbd><dd><kbd>a</kbd> or <kbd>A</kbd> for am, or <kbd>p</kbd> or <kbd>P</kbd> for pm, if <kbd>%h</kbd> is in the 12-hour clock. (So to match "am" you need <kbd>%am</kbd> and to match "AM" you need <kbd>%aM</kbd>) <dt><kbd>%U</kbd><dd>"Unix time" (seconds since beginning of 1970, GMT) <dt><kbd>%b</kbd><dd>number of bytes transferred <dt><kbd>%t</kbd><dd>processing time in seconds <dt><kbd>%T</kbd><dd>processing time in milliseconds <dt><kbd>%c</kbd><dd>HTTP status code <dt><kbd>%q</kbd><dd>query string (part of filename after <kbd>?</kbd>, if recorded in a separate field) <dt><kbd>%j</kbd><dd>junk: ignore this field (field can be empty too) <dt><kbd>%w</kbd><dd>white space: spaces or tabs <dt><kbd>%W</kbd><dd>optional white space <dt><kbd>%%</kbd><dd><kbd>%</kbd> sign <dt><kbd>\n</kbd><dd>new line <dt><kbd>\t</kbd><dd>tab stop <dt><kbd>\\</kbd><dd>single backslash </dl> So for example, the common log format, which looks like <pre> jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243 </pre> (except all on one line) could be represented by the <kbd>LOGFORMAT</kbd> command <pre> LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b) </pre> In other words, it's just the sample line but with the hostname replaced by <kbd>%S</kbd>, the username by <kbd>%u</kbd> etc. (The parentheses are needed because the argument contains spaces.) Or take another example: if you had lines which looked like <pre> Fri 25/12/98 5:45pm, /~sret1/, jay.bird.com, 200, 1243, http://www.site.com, Mozilla/2.0 (X11; I; HP-UX A.09.05) </pre> (all on one line again), you could use the format <pre> LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B) </pre> <hr> A logfile can sometimes have lines in several different formats. So you can specify several <kbd>LOGFORMAT</kbd> commands in a row, and they will all apply to the next logfile. This is also useful if the format of your logfile changes half way through. So in this example: <pre> LOGFORMAT COMMON LOGFORMAT COMBINED LOGFILE log1 LOGFORMAT (%j %d/%m/%y %h:%n%am, %r, %S, %c, %b, %f, %B) LOGFILE log2 LOGFILE log3 </pre> <kbd>log1</kbd> has lines in both common and combined format, whereas <kbd>log2</kbd> and <kbd>log3</kbd> have lines just in the format in the previous example. <p> If you specify several formats, analog tries to match each line to the first format first, then if that fails the next, and so on, so the order of the formats is important. Usually you want to specify the most common one first, to minimise the time spent trying to match lines to inappropriate formats. <hr> <a name="DEFAULTLOGFORMAT">I suggested above</a> that any logfile which doesn't have a <kbd>LOGFORMAT</kbd> command earlier in the same configuration file is auto-detected. But this isn't quite true. Actually such logfiles get a special format called the <em>default log format</em>. The default format starts off as auto-detection, but you can change it if you want with the <kbd>DEFAULTLOGFORMAT</kbd> command. This command works exactly the same as the <kbd>LOGFORMAT</kbd> command -- it understands the same formats, and if you have several <kbd>DEFAULTLOGFORMAT</kbd> commands, they accumulate in the same way. The difference is that they don't need to be put in any particular place. (There is also <kbd>APACHEDEFAULTLOGFORMAT</kbd>, which has the same effect but uses the Apache LogFormat strings.) <p> So let's go back to the first example: <pre> LOGFILE log0 LOGFORMAT format1 LOGFILE log1 LOGFORMAT format2 LOGFILE log2 LOGFILE log3 </pre> Here <kbd>log0</kbd> actually gets the default log format. If there are no <kbd>DEFAULTLOGFORMAT</kbd> commands, the default will be auto-detection. But if there are <kbd>DEFAULTLOGFORMAT</kbd> commands, even in another configuration file, that will be the format of <kbd>log0</kbd>. <p> The times you need to use the <kbd>DEFAULTLOGFORMAT</kbd> instead of the <kbd>LOGFORMAT</kbd> are if you want to change the format of logfiles which aren't given in a <kbd>LOGFILE</kbd> command -- for example, ones specified on the command line, or dragged onto the program icon on a Mac, or compiled in. It is also useful to use the <kbd>DEFAULTLOGFORMAT</kbd> if your logfiles are always in the same format, so that you don't have to worry about putting in enough <kbd>LOGFORMAT</kbd>s in the right places. <hr> <a name="fmtmisc">A couple more technical details</a> and tips about <kbd>LOGFORMAT</kbd> commands. <p> The "Unix time", <kbd>%U</kbd>, is always recorded in GMT. So you will probably need to use a <kbd><a href="#TIMEOFFSET">LOGTIMEOFFSET</a></kbd> command to convert to your local timezone. Also, it's just the integer part of the time, so if you have decimals you will have to use <kbd>%U.%j</kbd> . <p> The log formats which analog can handle are those which are known as <i>instantaneously decipherable</i>: in practice, this means that the character which terminates a string can never occur in the string. So for example, in common format, which looks like <pre> LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b) </pre> if the hostname ever contained a space, the line would be marked as corrupt, because analog terminates the host at the first space, <em>not</em> at the first occurrence of space-dash-space, and then the rest of the line wouldn't match. Of course, hostnames should never contain spaces, so this shouldn't be a problem. There are a couple of other restrictions: if there is any date or time information, then the year, month, date, hour and minute must all be present: and the same information may not occur twice in the format (so you can't have both <kbd>%m</kbd> and <kbd>%M</kbd>, for example, because these both represent the month; make one of them a <kbd>%j</kbd> to have it ignored). <p> <a name="starredfmt">Sometimes</a> you need to read one of the fields in a logfile, but not analyse it. For example, if you have a separate common log and referrer log, the referrer log might look like <pre> http://guide-p.infoseek.com/Titles -> /~sret1/analog/ </pre> But the requests for <kbd>/~sret1/analog/</kbd> would already have been counted when reading the main logfile, so you don't want to count them again now. You get round this by specifying a <kbd>*</kbd> in that item in the format string, like this: <pre> LOGFORMAT (%f -> %*r) </pre> <p> A tip: sometimes it is more efficient to specify two or more adjacent fields to ignore with a single <kbd>%j</kbd>, as long as the whole group ends with a recognisable character. So common format is more efficiently specified as <pre> LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b) </pre> -- in the date and time <kbd>[25/Dec/1998:17:45:35 +0000]</kbd>, the seconds and the timezone can be ignored with a single <kbd>%j</kbd>, extending until the close-bracket. <p> Another tip: <kbd>%j</kbd> can also be used to ignore whole lines, rather than just fields analog doesn't use. For example, the extended log format ignores lines beginning with <kbd>#</kbd> by using <pre> LOGFORMAT #%j </pre> and the Microsoft format ignore lines corresponding to FTP requests with <pre> LOGFORMAT (%*S, %*u, %m/%d/%y, %h:%n:%j, %j) </pre> If those formats had not been used, the lines would have been incorrectly marked as corrupt. <hr> <a name="fmtexamples">Finally</a>, both for reference and as examples, here is a list of all the fixed formats that analog understands, together with the example lines from the <a href="#formats">previous section</a> and their built-in definitions (split over two lines where necessary). <dl> <dt><a name="commonfmtex">Common format</a>, <kbd>LOGFORMAT COMMON</kbd> <dd><pre> jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243 LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b) LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b) </pre> <dt><a name="mscommonfmtex">Microsoft common format</a>, <kbd>LOGFORMAT MS-COMMON</kbd> <dd><pre> jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ "HTTP/1.0" 200 1243 LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%w"HTTP%j" %c %b) LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b) </pre> <dt><a name="combinedfmtex">Combined log</a>, <kbd>LOGFORMAT COMBINED</kbd> <dd><pre> jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0" 200 1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)" LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r%wHTTP%j" %c %b "%f" "%B") LOGFORMAT (%S %j %u [%d/%M/%Y:%h:%n:%j] "%j%w%r" %c %b "%f" "%B") </pre> <dt><a name="reffmtex">Referrer log</a>, <kbd>LOGFORMAT REFERRER</kbd> <dd><pre> [25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/ <i>or</i> http://www.site.com/ -> /~sret1/ LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %f -> %*r) LOGFORMAT (%f -> %*r) </pre> <dt><a name="browfmtex">Browser log</a>, <kbd>LOGFORMAT BROWSER</kbd> <dd><pre> [25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05) LOGFORMAT ([%d/%M/%Y:%h:%n:%j] %B) </pre> <dt><a name="msnafmtex">Microsoft log, North American dates</a>, <kbd>LOGFORMAT MICROSOFT-NA</kbd> <dd><pre> 192.64.25.41, -, 12/25/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10, 2178, 303, 1243, 200, 0, GET, /~sret1/, -, LOGFORMAT (%S, %u, %m/%d/%y, %h:%n:%j, W3SVC%j, %j, %v, %T, %j, %b, %c, %j, %j, %r, %q,) LOGFORMAT (%*S, %*u, %m/%d/%y, %h:%n:%j, %j) </pre> <dt><a name="msintfmtex">Microsoft log, international dates</a>, <kbd>LOGFORMAT MICROSOFT-INT</kbd> <dd><pre> 192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10, 2178, 303, 1243, 200, 0, GET, /~sret1/, -, LOGFORMAT (%S, %u, %d/%m/%y, %h:%n:%j, W3SVC%j, %j, %v, %T, %j, %b, %c, %j, %j, %r, %q,) LOGFORMAT (%*S, %*u, %d/%m/%y, %h:%n:%j, %j) </pre> <dt><a name="websitena">WebSite log, North American dates</a>, <kbd>LOGFORMAT WEBSITE-NA</kbd> <dd><pre> 12/25/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/ http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178 LOGFORMAT (%m/%d/%y %h:%n:%j\t%S\t%v\t%j\t%u\t%j\t%r\t%f\t%j\t%B\t%c\t%b\t%T) </pre> <dt><a name="websiteint">WebSite log, international dates</a>, <kbd>LOGFORMAT WEBSITE-INT</kbd> <dd><pre> 25/12/98 17:45:35 jay.bird.com host1 Server fred GET /~sret1/ http://www.site.com/ Mozilla/2.0 (X11; I; HP-UX A.09.05) 200 1243 2178 LOGFORMAT (%d/%m/%y %h:%n:%j\t%S\t%v\t%j\t%u\t%j\t%r\t%f\t%j\t%B\t%c\t%b\t%T) </pre> </dl> The extended log, Netscape log and WebSTAR log don't have any built-in formats: analog constructs their formats from their header lines. <hr> <hr> <a name="alias"><h2>Aliases</h2> </a> <a name="CASE">After</a> analog has read each logfile entry, it then applies aliases to each of the items. First, if you have a case insensitive filesystem, analog converts the filename to lower case. Usually analog assumes that Unix and BeOS filesystems are case sensitive and other systems are case insensitive. You might want to override its choice, if, for example, you have transferred files from one machine to another, so as to use the convention on the original machine. You can do this by the commands <pre> CASE INSENSITIVE CASE SENSITIVE </pre> There are similar commands for usernames, if your logfile records these. By default, usernames are always case insensitive, but you can specify <pre> USERCASE SENSITIVE </pre> to override this. <hr> <a name="DIRSUFFIX">Next it</a> applies built-in aliases to each item. For example, it knows that <kbd>%7E</kbd> in a filename or referrer is equivalent to <kbd>~</kbd> and translates it accordingly. It also strips off the directory suffix from any filenames which have it. This suffix is normally <kbd>index.html</kbd>, but you can specify another one instead with a command such as <pre> DIRSUFFIX default.htm </pre> (You can only have one <kbd>DIRSUFFIX</kbd>.) There are other built-in aliases for other items: for example, hostnames are converted to lower case at this point. <hr> <a name="useraliases">After this</a>, it applies user-specified aliases to each item. These aliases are useful if, for example, you know that two filenames correspond to the same file, or if you want to translate local hostnames to their internet equivalents. You specify aliases by commands like <pre> FILEALIAS /football.html /soccer.html HOSTALIAS lion lion.statslab.cam.ac.uk </pre> There is also the special command <kbd>FILEALIAS none</kbd>, which cancels any other file aliases which might have been specified. <p> The alias commands for the other items are called <kbd>BROWALIAS</kbd>, <kbd>REFALIAS</kbd>, <kbd>USERALIAS</kbd> and <kbd>VHOSTALIAS</kbd>. Only one alias is ever applied to any item. So after <pre> FILEALIAS /football.html /soccer.html FILEALIAS /soccer.html /brazil.html </pre> the file <kbd>/soccer.html</kbd> would get translated to <kbd>/brazil.html</kbd>, but <kbd>/football.html</kbd> would only get translated to <kbd>/soccer.html</kbd> and would not see the second alias. <p> You can also use wildcards (<kbd>?</kbd> and <kbd>*</kbd>) in alias commands. And on the right-hand side, you can use <kbd>$1</kbd>, <kbd>$2</kbd> etc. to represent the parts of the original name matched by the <kbd>*</kbd>'s. (You can use <kbd>$$</kbd> to get an actual <kbd>$</kbd> on the right-hand side.) As a special abbreviation, if there is exactly one <kbd>*</kbd> on the left-hand side, then a <kbd>*</kbd> on the right-hand side can be used to represent <kbd>$1</kbd>. So, for example, <pre> FILEALIAS /*/football/* /soccer/ </pre> would translate <kbd>/sport/football/rules.html</kbd> to just <kbd>/soccer/</kbd>, but either of <pre> FILEALIAS /*/football/* /$1/soccer/$2 # or FILEALIAS /sport/football/* /sport/soccer/* </pre> would translate <kbd>/sport/football/rules.html</kbd> to <kbd>/sport/soccer/rules.html</kbd>. <p> Analog's <kbd>*</kbd>'s are un-greedy: if there are two possible ways of matching, the part of the expression on the left matches as little as possible. This is more often what you want. But it contrasts with Perl's regular expressions, for example. (Oh, two consecutive <kbd>*</kbd>'s are completely useless, but if you try it they are collapsed into one before counting the <kbd>$1</kbd>, <kbd>$2</kbd>, etc.) <p> The behaviour of <kbd>FILEALIAS</kbd> and <kbd>REFALIAS</kbd> can be slightly unintuitive if the file has <a href="#unintuitive">search arguments</a>. <p> A warning to Unix users: if you put an <kbd>ALIAS</kbd> command on the command line with <kbd><a href="#plusC">+C</a></kbd>, the shell may try and expand <kbd>$1</kbd> etc., which is not what you want. To stop the shell doing this, put the command in single quotes instead of double quotes. <hr> <a name="OUTPUTALIAS">There is another set</a> of alias commands, called <i>output aliases</i>. There is one of these for each of the reports, except the time reports. Instead of acting on items when the logfile is being read, they act on individual lines in the output. So for example, the command <pre> TYPEOUTPUTALIAS .txt ".txt (Plain text files)" </pre> would provide an explanation of that line in the file type report. <p> There can be some confusion between some <kbd>ALIAS</kbd> and <kbd>OUTPUTALIAS</kbd> commands. For example, what is the difference between <kbd>HOSTALIAS</kbd> and <kbd>HOSTOUTPUTALIAS</kbd>? In fact, there are several differences, resulting from the different times at which the aliases are processed. The <kbd>HOSTALIAS</kbd> applies to the host <i>items</i>, but the <kbd>HOSTOUTPUTALIAS</kbd> only applies to the <i>lines in the host report</i>. This means that the <kbd>HOSTALIAS</kbd> also affects the other reports which use the hosts, such as the domain report, whereas the <kbd>HOSTOUTPUTALIAS</kbd> only affects the host report. Also the <kbd>HOSTOUTPUTALIAS</kbd> applies separately to each line of the host report. This means that if two separate hosts translate to the same thing in a <kbd>HOSTALIAS</kbd> command, they will become one host ever after. But if one were to use the same <kbd>HOSTOUTPUTALIAS</kbd> commands, there would be two hosts, which would just happen to have the same name in one report. <p> In summary, <kbd>HOSTALIAS</kbd> would normally be used if a single host had two different names, so might otherwise appear to be two hosts, whereas <kbd>HOSTOUTPUTALIAS</kbd> would normally be used to annotate or clarify the host report. <p> The full list of output aliases is <kbd>REQOUTPUTALIAS</kbd>, <kbd>REDIROUTPUTALIAS</kbd>, <kbd>FAILOUTPUTALIAS</kbd>, <kbd>TYPEOUTPUTALIAS</kbd>, <kbd>DIROUTPUTALIAS</kbd>, <kbd>HOSTOUTPUTALIAS</kbd>, <kbd>DOMOUTPUTALIAS</kbd>, <kbd>ORGOUTPUTALIAS</kbd>, <kbd>REFOUTPUTALIAS</kbd>, <kbd>REFSITEOUTPUTALIAS</kbd>, <kbd>REDIRREFOUTPUTALIAS</kbd>, <kbd>FAILREFOUTPUTALIAS</kbd>, <kbd>BROWOUTPUTALIAS</kbd>, <kbd>FULLBROWOUTPUTALIAS</kbd>, <kbd>OSOUTPUTALIAS</kbd>, <kbd>VHOSTOUTPUTALIAS</kbd>, <kbd>USEROUTPUTALIAS</kbd> and <kbd>FAILUSEROUTPUTALIAS</kbd>. <p> There is one known bug with <kbd>OUTPUTALIAS</kbd>. The report is sorted before the <kbd>OUTPUTALIAS</kbd> is applied. This means that if the <kbd><a href="#SORTBY">SORTBY</a></kbd> for the report is set to <kbd>ALPHABETICAL</kbd>, then the report will not be sorted correctly. <hr> <a name="aliasregexp">If you have an operating system</a> with regular expressions (only Unix??) you can include them in the <kbd>ALIAS</kbd> commands. Otherwise you might as well go straight on to the <a href="#include">next section</a>. <p> Sorry, I'm not going to teach you how to use regular expressions here if you don't already know: if you're on Unix try typing <kbd>man regex</kbd> or <kbd>man grep</kbd>. There are lots of implementations of regular expressions. The ones which analog uses are POSIX extended regular expressions, as in Unix <kbd>egrep</kbd>. If you're familiar with regular expressions from Perl, or even from GNU <kbd>grep -E</kbd>, you will not find all the same features here. <p> You include regular expressions in an <kbd>ALIAS</kbd> command by prefixing the left-hand side of the alias with "<kbd>REGEXP:</kbd>". Or you can specify a case-insensitive match, like Unix <kbd>egrep -i</kbd>, by using "<kbd>REGEXPI:</kbd>". (It's automatically case-insensitive for many items, such as hostnames, or filenames if you have specified <kbd><a href="#CASE">CASE INSENSITIVE</a></kbd>.) <p> On the right-hand side of the alias you can use <kbd>$1</kbd>, <kbd>$2</kbd> etc. to represent the first, second etc. bracketed expression on the left-hand side, counting in order of the left brackets. (Again, you can't put <kbd>$1</kbd>, <kbd>$2</kbd> etc. on the command line unless you put them in single quotes.) <p> Regular expressions match if they match just part of the string. If you want them to have to match the whole of the string, you have to anchor them to the ends of the string with <kbd>^</kbd> and <kbd>$</kbd>. <p> For example, <pre> REQOUTPUTALIAS REGEXP:^(/~([^/]*).*) "[$2] $1" </pre> would translate <pre> /~sret1/backgammon/rules.html</pre> to <pre> [sret1] /~sret1/backgammon/rules.html</pre> in the Request Report. Or <pre> HOSTALIAS REGEXP:^([^.]*)$ $1.mycompany.com </pre> would add <kbd>.mycompany.com</kbd> to all hostnames not containing a dot. (See the <a href="#designfaq">FAQ</a> for a discussion about whether this is a good idea.) <p> Regular expressions are greedy: if there are two possible ways of matching, the part of the expression on the left matches as much as possible. <hr> <hr> <a name="include"><h2>Inclusions and exclusions</h2> </a> After aliasing each item, analog decides whether that item is wanted or not. The whole line is only counted if all the items are wanted. Whether an item is wanted or not is determined by <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands specified by the user. These commands can be used to exclude requests from your local users, for example, or to analyse only files in a subdirectory. For example <pre> HOSTEXCLUDE mycomputer.myisp.com </pre> would exclude all requests by that computer from the statistics. <p> The rule for determining whether an item is included or excluded is as follows. All the <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands for that item are considered one by one in order, and the item is included or excluded according to the last command it matched. Items which don't match any of the <kbd>INCLUDE</kbd> or <kbd>EXCLUDE</kbd> commands are included if the first command was an exclusion, and excluded if the first command was an inclusion. For example, the configuration <pre> FILEINCLUDE /~sret1/* FILEEXCLUDE /~sret1/backgammon/*,/~sret1/analog/* FILEINCLUDE /~sret1/backgammon/*.gif </pre> would instruct the program to examine only my files, excluding my backgammon and analog files, but including gifs in my backgammon directory. On the other hand, <pre> FILEEXCLUDE /~sret1/*/img/* </pre> would analyse all files, except for images in my various directories. Note that inclusions and exclusions can contain any number of wildcards. <p> The full list of these commands is <kbd>HOSTINCLUDE</kbd> and <kbd>HOSTEXCLUDE</kbd>; <kbd>FILEINCLUDE</kbd> and <kbd>FILEXCLUDE</kbd>; <kbd>BROWINCLUDE</kbd> and <kbd>BROWEXCLUDE</kbd>; <kbd>REFINCLUDE</kbd> and <kbd>REFEXCLUDE</kbd>; <kbd>USERINCLUDE</kbd> and <kbd>USEREXCLUDE</kbd>; and <kbd>VHOSTINCLUDE</kbd> and <kbd>VHOSTEXCLUDE</kbd>. <p> Because the inclusions and exclusions take place <em>after</em> the aliasing, the name you must use is the aliased name. (In the absence of <kbd><a href="#OUTPUTALIAS">OUTPUTALIAS</a></kbd> commands, this is the name of the item in the output.) <p> Sometimes a line doesn't contain a particular sort of item, either because there is no field reserved for it on the line, or because the browser didn't send it for that request. You can include or exclude these lines by making a special blank entry in the <kbd>INCLUDE</kbd> or <kbd>EXCLUDE</kbd> command. For example, <pre> USERINCLUDE jim USERINCLUDE "" </pre> would include lines from user <kbd>jim</kbd> and lines without any user specified. <p> The behaviour of <kbd>REQINCLUDE</kbd> and <kbd>REFINCLUDE</kbd> can be slightly unintuitive if the file has <a href="#unintuitive">search arguments</a>. <p> <a name="incregexp">On suitable operating systems</a>, you can use regular expressions for the inclusions and exclusions by prefixing the expression with "<kbd>REGEXP:</kbd>" or "<kbd>REGEXPI:</kbd>". I've already described this at length in the context of aliases, so you can <a href="#aliasregexp">look there</a> for all the details. <p> If you get confused with all the inclusions and exclusions, remember that you can always run <kbd>analog -settings</kbd> to see what the options you have specified represent. <hr> <a name="FROMTO">There is also</a> one other pair of commands which belongs in this category, namely the <kbd>FROM</kbd> and <kbd>TO</kbd> commands. These specify a time period to restrict the analysis to. The simplest usage of these commands is <kbd>FROM yyMMdd</kbd> or <kbd>FROM yyMMdd:hhmm</kbd>, where <kbd>yy</kbd> represents the last two digits of the year (analog assumes that the year is between 1970 and 2069), <kbd>MM</kbd> represents the month, <kbd>dd</kbd> is the date, <kbd>hh</kbd> the hour, and <kbd>mm</kbd> the minute. So, for example, to analyse only requests from July 1999 to June 2000 I would use the configuration <pre> FROM 990701 TO 000630 </pre> Alternatively, each of the components can be preceded by <kbd>+</kbd> or <kbd>-</kbd> to represent time relative to the time at which the program was invoked. In this case, the date can have more than 2 digits. This allows constructions like <pre> FROM -01-00+01 # from tomorrow last year TO -00-0131 # to the end of last month (OK even if last month # didn't have 31 days) FROM -00-00-112 TO -00-00-01 # statistics for the last 16 weeks FROM -00-00-00:-06+01 # statistics for the last 6 hours </pre> There are command line abbreviations <kbd>+F</kbd> and <kbd>+T</kbd> for the <kbd>FROM</kbd> and <kbd>TO</kbd> commands; for example, <kbd>+T-00-00-01:1800</kbd> looks at statistics until 6pm yesterday. <kbd>-F</kbd> and <kbd>-T</kbd> turn off the from and to, as do <kbd>FROM OFF</kbd> and <kbd>TO OFF</kbd>. <hr> <a name="outputexcludes">There are also</a> <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands for most of the reports. These exclude individual lines from particular reports. So, for example, the command <pre> REFREPEXCLUDE http://your.site.com/* </pre> would exclude your internal referrers from the Referrer Report. However, it would not exclude them from the Failed Referrer Report, the Referring Site Report, etc. (you need to use <kbd>FAILREFEXCLUDE</kbd>, <kbd>REFSITEEXCLUDE</kbd> etc. for that); nor would it prevent other analysis of logfile lines with those referrers, as <kbd>REFEXCLUDE</kbd> would. Also <kbd>REFREPEXCLUDE</kbd> would include the referrers in the "not listed" line at the bottom of the report. <p> The full list of these commands is <kbd>REQINCLUDE</kbd> and <kbd>REQEXCLUDE</kbd>; <kbd>REDIRINCLUDE</kbd> and <kbd>REDIREXCLUDE</kbd>; <kbd>FAILINCLUDE</kbd> and <kbd>FAILEXCLUDE</kbd>; <kbd>TYPEINCLUDE</kbd> and <kbd>TYPEEXCLUDE</kbd>; <kbd>DIRINCLUDE</kbd> and <kbd>DIREXCLUDE</kbd>; <kbd>HOSTREPINCLUDE</kbd> and <kbd>HOSTREPEXCLUDE</kbd>; <kbd>DOMINCLUDE</kbd> and <kbd>DOMEXCLUDE</kbd>; <kbd>ORGINCLUDE</kbd> and <kbd>ORGEXCLUDE</kbd>; <kbd>REFREPINCLUDE</kbd> and <kbd>REFREPEXCLUDE</kbd>; <kbd>REFSITEINCLUDE</kbd> and <kbd>REFSITEEXCLUDE</kbd>; <kbd>SEARCHQUERYINCLUDE</kbd> and <kbd>SEARCHQUERYEXCLUDE</kbd>; <kbd>SEARCHWORDINCLUDE</kbd> and <kbd>SEARCHWORDEXCLUDE</kbd>; <kbd>REDIRREFINCLUDE</kbd> and <kbd>REDIRREFEXCLUDE</kbd>; <kbd>FAILREFINCLUDE</kbd> and <kbd>FAILREFEXCLUDE</kbd>; <kbd>BROWSUMINCLUDE</kbd> and <kbd>BROWSUMEXCLUDE</kbd>; <kbd>FULLBROWINCLUDE</kbd> and <kbd>FULLBROWEXCLUDE</kbd>; <kbd>OSINCLUDE</kbd> and <kbd>OSEXCLUDE</kbd>; <kbd>VHOSTREPINCLUDE</kbd> and <kbd>VHOSTREPEXCLUDE</kbd>; <kbd>USERREPINCLUDE</kbd> and <kbd>USERREPEXCLUDE</kbd>; and <kbd>FAILUSERINCLUDE</kbd> and <kbd>FAILUSEREXCLUDE</kbd>. The inclusion or exclusion applies to the unaliased name, if you are doing any <a href="#OUTPUTALIAS">output aliases</a>. <p> <!-- not just in output IN/EXCLUDEs, although the layout of this text might --> <!-- imply that so as to present REQINCLUDE pages in the right place --> You can also use the symbolic word <kbd>pages</kbd> in suitable <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands; one very common command is <pre> REQINCLUDE pages </pre> to include only pages in the request report. <hr> <a name="PAGEINCLUDE">Analog determines</a> which files should count as pages (and thus which requests count as page requests) using another <kbd>INCLUDE</kbd>/<kbd>EXCLUDE</kbd> pair, called <kbd>PAGEINCLUDE</kbd> and <kbd>PAGEEXCLUDE</kbd>. By default, <kbd>*.html</kbd>, <kbd>*.htm</kbd> and directories (<kbd>*/</kbd>) count as pages. But you change the list by commands like <pre> PAGEINCLUDE *.ps,*.ps.gz PAGEEXCLUDE /sret1.html </pre> I.e., Postscript and gzipped Postscript are pages, but <kbd>/sret1.html</kbd> isn't. (If the file has <a href="#args">search arguments</a>, the <kbd>PAGEINCLUDE</kbd> and <kbd>PAGEEXCLUDE</kbd> are reckoned just on the part of the filename before the question mark.) <hr> <a name="LINKINCLUDE">There is one more</a> set of <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands which I'll describe now. In the Request Report and the three referrer reports (Referrer Report, Redirected Referrer Report and Failed Referrer Report), analog can link to the files which it's listing. There are commands <kbd>LINKINCLUDE</kbd> and <kbd>LINKEXCLUDE</kbd> for the Request Report, and <kbd>REFLINKINCLUDE</kbd> and <kbd>REFLINKEXCLUDE</kbd> for the referrer reports, to specify exactly which files are linked to. So, for example, <kbd> REFLINKINCLUDE pages </kbd> would link to pages in the three referrer reports. <hr> There is one final set of <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands to include or exclude the search arguments at the end of URLs. But there are some slightly complicated issues surrounding those, so they deserve a <a href="#args">new section</a>. <hr> <hr> <a name="args"><h2>Search arguments</h2> </a> Sometimes a URL contains arguments after a question mark. For example, the URL <pre> /cgi-bin/script.pl?x=1&y=2 </pre> runs the <kbd>/cgi-bin/script.pl</kbd> program with arguments <kbd>x=1</kbd> and <kbd>y=2</kbd>. (Sometimes the server records these arguments in a separate field in the logfile, but if so you can use the <kbd>%q</kbd> field in the <kbd><a href="#fmtstrings">LOGFORMAT</a></kbd> command, and analog will translate the filename to the above format). <p> You can tell analog either to read or to ignore the arguments using the commands <kbd>ARGSINCLUDE</kbd> and <kbd>ARGSEXCLUDE</kbd> which we'll discuss <a href="#ARGSINCLUDE">in a minute</a>. But by default, all arguments are read, and as this is usually what you want, you don't usually need those commands. <p> You don't always see the arguments in the reports, even if they're being read, because analog doesn't show them if there aren't enough of them. In order to see them, you have to set the corresponding <kbd><a href="#ARGSFLOOR">ARGSFLOOR</a></kbd> parameter low enough. <p> Also note that within a report, the search arguments are listed immediately under the file to which they refer. This temporarily interrupts the normal order of the files. It may be clearer if you turn the <a href="#othCOLS"><kbd>N</kbd> column</a> on. <hr> Assuming that the arguments are being read, analog treats the file <kbd>/cgi-bin/script.pl?x=1&y=2</kbd> as a different file from <kbd>/cgi-bin/script.pl</kbd> (or from <kbd>/cgi-bin/script.pl?y=2&x=1</kbd> for that matter). It doesn't look like that in the Request Report because you see a grand total for <kbd>/cgi-bin/script.pl</kbd> with all its different arguments. But it matters if you want to do <a href="#include">inclusions and exclusions</a> or <a href="#useraliases">aliases</a> on the file. <p> <a name="unintuitive">The reason</a> is that, for example, the command <pre> FILEINCLUDE /cgi-bin/script.pl </pre> <em>doesn't</em> match the file <kbd>/cgi-bin/script.pl?x=1&y=2</kbd>. To match that, you would have to use something like <pre> FILEINCLUDE /cgi-bin/script.pl* </pre> instead. Similarly <pre> FILEALIAS /cgi-bin/script.pl /script.pl </pre> will change <kbd>/cgi-bin/script.pl</kbd> itself, but not <kbd>/cgi-bin/script.pl?x=1&y=2</kbd>. You might want to use something like <pre> FILEALIAS /cgi-bin/script.pl?* /script.pl?$1 </pre> as well. (However, <kbd>PAGEINCLUDE</kbd> and <kbd>PAGEEXCLUDE</kbd> always refer to the part of the filename before the question mark.) <hr> <a name="ARGSINCLUDE">The alternative</a> is to tell analog not to read the search arguments. There are commands called <kbd>ARGSINCLUDE</kbd> and <kbd>ARGSEXCLUDE</kbd>, and <kbd>REFARGSINCLUDE</kbd> and <kbd>REFARGSEXCLUDE</kbd>, to do this. They work the same as the <a href="#include">other <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd></a> commands which we discussed in the previous section. So, for example, if the command <pre> ARGSEXCLUDE /cgi-bin/script.pl </pre> were given, analog would ignore the arguments to that file, and so read <kbd>/cgi-bin/script.pl?x=1&y=2</kbd> as just <kbd>/cgi-bin/script.pl</kbd>. On the other hand, if <pre> ARGSINCLUDE /cgi-bin/script.pl </pre> were specified, analog would read the arguments, and so treat <kbd>/cgi-bin/script.pl?x=1&y=2</kbd> as a different file from <kbd>/cgi-bin/script.pl</kbd>. <kbd>REFARGSINCLUDE</kbd> and <kbd>REFARGSEXCLUDE</kbd> are the same for referrers. <p> Technical note: the check for whether the arguments should be included happens before the filename has been subject to either built-in or user-specified <a href="#alias">aliases</a>. So you have to use the unaliased name, exactly as it occurs in the logfile. For example, <kbd>ARGSINCLUDE /~sret1/script.pl</kbd> won't match <kbd>/%7Esret1/script.pl</kbd> even though they are really the same file. It also means that you can't use "<kbd>pages</kbd>" in the <kbd>ARGSINCLUDE</kbd> or <kbd>ARGSEXCLUDE</kbd> command, because we don't know whether a file is a page until after it's been aliased. <hr> <a name="SEARCHENGINE">There is a related command</a> called <kbd>SEARCHENGINE</kbd>. If you have referrers with search arguments, usually from search engines, you can tell analog which field corresponds to the search term. It uses this information to compile the Search Query Report and the Search Word Report. For example, consider the referrer <pre> http://www.altavista.com/cgi-bin/query?pg=q&kl=XX&q=carrot+cake </pre> The search term is in the field <kbd>q=</kbd> so the appropriate <kbd>SEARCHENGINE</kbd> command is <pre> SEARCHENGINE http://www.altavista.com/cgi-bin/query q </pre> or even better <pre> SEARCHENGINE http://*altavista.*/* q </pre> to allow for all their mirror sites in different countries. <p> Sometimes a search engine has two or more possible fields for the search term. In that case you can list all of them separated by commas, like this: <pre> SEARCHENGINE http://*webcrawler.*/* search,searchText </pre> <hr> The rest of this section is a bit technical, and you usually don't need to worry about it. On a first reading, you might like to <a href="#output">skip it</a>. <p> <a name="SCC">I said</a> <a href="#DIRSUFFIX">previously</a> that <kbd>%7E</kbd> in a URL is automatically converted to <kbd>~</kbd>, etc. In fact this is only done to the ASCII-printable characters <kbd>%20-%7E</kbd> (because these are the only characters that are the same in every character set). <p> But in the Search Query Report and Search Word Report it is useful to be able to convert non-ASCII characters too, so that you can see the actual words people typed, rather than get the <kbd>%nm</kbd> codes in place of all accented letters. So in these reports analog also converts characters <kbd>%A0-%FF</kbd> (if you are using an ISO-8859-* character set) or <kbd>%80-%FF</kbd> (for other character sets, apart from ASCII). <p> However, there are reasons why you might not want this feature, and you can turn it off with the command <pre> SEARCHCHARCONVERT OFF </pre> These reasons include: <ol> <li>The character set in which the query was submitted to the search engine may not be the same as that in which the page reached was written, or that in which the analog output page is being written. So converting to the character set of the analog output page may give garbage anyway. This is particularly a problem with languages, such as Russian or Chinese, which have two or more characters sets in common use. It is also a problem for sites which host resources in many languages. <li>Not all of the character positions correspond to printable characters in every character set. Analog knows that <kbd>%80-%9F</kbd> are non-printable in the ISO-8859-* character sets, but apart from that it converts everything in <kbd>%80-%FF</kbd>. So you may end up with non-printable characters in your output. <li>I have no idea how well, if at all, this feature will work with multibyte character sets (such as most East Asian languages). You will probably find you want to turn it off in this case. </ol> <hr> <hr> <a name="output"><h2>Configuring the output</h2> </a> So far we have mainly discussed commands which control how analog reads the logfiles. We now get on to commands for configuring the output. <p> <a name="replist">There are 32 different reports</a> which analog can produce, if your logfiles contain the necessary information. Each one has a short name, and a code letter or number, as follows: <pre> x <kbd>GENERAL</kbd> General Summary m <kbd>MONTHLY</kbd> Monthly Report W <kbd>WEEKLY</kbd> Weekly Report D <kbd>FULLDAILY</kbd> Daily Report d <kbd>DAILY</kbd> Daily Summary H <kbd>FULLHOURLY</kbd> Hourly Report h <kbd>HOURLY</kbd> Hourly Summary 4 <kbd>QUARTER</kbd> Quarter-Hour Report 5 <kbd>FIVE</kbd> Five-Minute Report S <kbd>HOST</kbd> Host Report Z <kbd>ORGANISATION</kbd> Organisation Report o <kbd>DOMAIN</kbd> Domain Report r <kbd>REQUEST</kbd> Request Report i <kbd>DIRECTORY</kbd> Directory Report t <kbd>FILETYPE</kbd> File Type Report z <kbd>SIZE</kbd> File Size Report P <kbd>PROCTIME</kbd> Processing Time Report E <kbd>REDIR</kbd> Redirection Report I <kbd>FAILURE</kbd> Failure Report f <kbd>REFERRER</kbd> Referrer Report s <kbd>REFSITE</kbd> Referring Site Report N <kbd>SEARCHQUERY</kbd> Search Query Report n <kbd>SEARCHWORD</kbd> Search Word Report k <kbd>REDIRREF</kbd> Redirected Referrer Report K <kbd>FAILREF</kbd> Failed Referrer Report B <kbd>FULLBROWSER</kbd> Browser Report b <kbd>BROWSER</kbd> Browser Summary p <kbd>OSREP</kbd> Operating System Report v <kbd>VHOST</kbd> Virtual Host Report u <kbd>USER</kbd> User Report J <kbd>FAILUSER</kbd> Failed User Report c <kbd>STATUS</kbd> Status Code Report </pre> For details on what the various reports mean, and a summary of the commands which control them, see the section on <cite><a href="#reports">Analog's reports</a></cite>. <hr> <a name="ONOFF">You can turn each report on or off</a> with configuration commands like <pre> FIVE OFF REFSITE ON </pre> or by using command line arguments like <kbd>-5</kbd> and <kbd>+s</kbd>. You can also turn all reports except the General Summary on or off with the commands <kbd>ALL ON</kbd> and <kbd>ALL OFF</kbd>, or with the command line arguments <kbd>+A</kbd> and <kbd>-A</kbd>. <p> <a name="GOTOS">You can turn the "Go To" lines</a> in the report off with the command <pre> GOTOS OFF </pre> <kbd>GOTOS ON</kbd> turns them on again, and <kbd>GOTOS FEW</kbd> puts the "Go To" lines just at the top and bottom. <kbd>GOTOS OFF</kbd> can be abbreviated with the <kbd>-X</kbd> command line argument, and <kbd>GOTOS ON</kbd> with <kbd>+X</kbd>. <p> <a name="RUNTIME">You can turn</a> the "Running Time" line at the bottom of the report off with the command <pre> RUNTIME OFF </pre> and on again with <kbd>RUNTIME ON</kbd>. <p> <a name="LASTSEVEN">The figures in parentheses</a> in the General Summary are for the last seven days: either the seven days before the <kbd>TO</kbd> time, or if no <kbd>TO</kbd> time is given, the seven days before the time of the program start. The figures for the last seven days are normally included if some, but not all, of the requests fall in those seven days; but you can turn them off by means of the command <pre> LASTSEVEN OFF </pre> Of course <kbd>LASTSEVEN ON</kbd> turns them on again. <p> <a name="REPORTORDER">You can change the order</a> of the reports by means of the <kbd>REPORTORDER</kbd> command. You should list the code letters for all possible reports in the order you want them, like this: <pre> REPORTORDER xcmdDhH45WriSoEItzsfKkuJvbB </pre> <hr> <a name="OUTFILE">You can change which file</a> the output goes to with a command like <pre> OUTFILE stats.htm </pre> or with a command line argument like <kbd>+Ostats.htm</kbd>. If you use the filename <kbd>-</kbd> or <kbd>stdout</kbd>, the output will go to standard output, which is normally the screen, but Unix users might like to redirect it to another file or even into a pipe. You can also use an absolute path name, like <pre> OUTFILE /usr/bin/httpd/htdocs/stats.html # Unix OUTFILE "Hard Disk:Server Apps:WebSTAR:Analog:Report.html" # Mac </pre> <p> Sometimes it's convenient to include the date in the name of the <kbd>OUTFILE</kbd>. You can do this by including the following codes in the filename. <pre> %D date of month %m month name %M month number %y two-digit year %Y four-digit year %H hour %n minute %w day of week </pre> So for example, <pre> OUTFILE stats%y%M.html </pre> will produce filenames like <kbd>stats9905.html</kbd>. The date used is the <kbd><a href="#FROMTO">TO</a></kbd> date if one was specified, and otherwise the time of the start of the program. <hr> <a name="outstyle">Now we come</a> to some very important commands. The first is the <kbd>OUTPUT</kbd> command, which changes the style of the output. There are three possible output styles, <kbd>HTML</kbd>, <kbd>ASCII</kbd> and <kbd>COMPUTER</kbd>. The first produces Web pages, the second plain text files (which you could mail to people, for example) and the third produces output suitable for reading by a computer (useful for reading into a spreadsheet, or post-processing with a graphics package, for example). There is a separate section about the <cite><a href="#compout">Computer readable output</a></cite> later. As well as a command like <pre> OUTPUT ASCII </pre> you can also select <kbd>ASCII</kbd> style with the command line argument <kbd>+a</kbd>, and <kbd>HTML</kbd> with the command line argument <kbd>-a</kbd>. You can also specify <kbd>OUTPUT NONE</kbd> for no output, if you are producing a <a href="#cache">cache file</a>. <hr> <a name="LANGUAGE">Next, you can change the language</a> of the output. There are two ways to do this. The usual way is to use the <kbd>LANGUAGE</kbd> command. For example, the command <pre> LANGUAGE FRENCH </pre> will give you the output in French. The available languages at the moment are <kbd>ARMENIAN</kbd>, <kbd>BOSNIAN</kbd>, <kbd>CATALAN</kbd>, <kbd>SIMP-CHINESE</kbd> (GB2312 encoding), <kbd>TRAD-CHINESE</kbd> (Big5 encoding), <kbd>CZECH</kbd>, <kbd>DANISH</kbd>, <kbd>DUTCH</kbd>, <kbd>ENGLISH</kbd>, <kbd>US-ENGLISH</kbd>, <kbd>FINNISH</kbd>, <kbd>FRENCH</kbd>, <kbd>GERMAN</kbd>, <kbd>GREEK</kbd>, <kbd>ITALIAN</kbd>, <kbd>JAPANESE</kbd>, <kbd>NORWEGIAN</kbd> (Bokmål), <kbd>NYNORSK</kbd>, <kbd>POLISH</kbd>, <kbd>PORTUGUESE</kbd>, <kbd>BR-PORTUGUESE</kbd>, <kbd>RUSSIAN</kbd>, <kbd>SERBIAN</kbd>, <kbd>SLOVAK</kbd>, <kbd>SLOVENE</kbd>, <kbd>SPANISH</kbd>, <kbd>SWEDISH</kbd>, <kbd>TURKISH</kbd> and <kbd>UKRAINIAN</kbd>. <p><i>Note: The following additional languages were available in version 3 of analog: <kbd>HUNGARIAN</kbd>, <kbd>ICELANDIC</kbd>, <kbd>KOREAN</kbd>, <kbd>LATVIAN</kbd>, <kbd>LITHUANIAN</kbd> and <kbd>ROMANIAN</kbd>. I hope that they will be available for this version soon. As they are translated, they will be added to the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a>. Version 3 of analog will also be available at the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a> for a while, if you need one of these languages. Alternatively, you can find the language files in the <kbd>lang</kbd> directory, and translate the few new English phrases yourself.</i> <p> The other way is to use the <kbd>LANGFILE</kbd> command. This is useful if you want to download a new language from the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a>, or if you want to translate one yourself, or even if you want to change some words or phrases or the way the dates and times are formatted in the output. The <kbd>LANGFILE</kbd> command tells analog in which file to find the various words and phrases for a new language. For example, the command <pre> LANGFILE lang/guarani.lng # or LANGFILE /usr/etc/httpd/analog/lang/guarani.lng </pre> would read from that file. (Note that you have to include the directory name if the file isn't in the directory or folder which you're running analog from. In particular, it's not assumed to be in the same directory as the other language files.) <p> Some languages also have <a href="#domfile">domains files</a> available. These are normally selected automatically by the <kbd>LANGUAGE</kbd> command. But you can tell analog to use a different domains file with the <kbd><a href="#domfile">DOMAINSFILE</a></kbd> command. Also, some languages have translations of the <a href="#form">form interface</a>. <p> If you want to translate another language, I would be delighted! You'd be wise to contact me first to make sure that no-one else is already translating the same language. The English language file contains some brief instructions for translating new languages. <hr> <a name="TIMEOFFSET">Sometimes</a> your server is not in the same timezone as you, or at least records the times in its logfiles in a different timezone (for example GMT). So that you can get your statistics in your local time, there is a command called <kbd>LOGTIMEOFFSET</kbd> to change the time by a certain number of minutes. As with the <kbd><a href="#logfmt">LOGFORMAT</a></kbd> command, this only affects logfiles which come <em>later</em> in the <em>same</em> configuration file. <p> You have to be careful using this command. Because of daylight savings time in operation in different parts of the world at different times, analog cannot attempt to convert between different timezones. So it's your responsibility to set the right offset for different times of year. For example, if you were in Chicago, but your server was recording time in GMT, you would need to specify two different time offsets, one of minus five hours for summer and one of minus six hours for winter. You would need to split your logfiles in the right places and then run commands like <pre> LOGTIMEOFFSET -300 LOGFILE summer*.log LOGTIMEOFFSET -360 LOGFILE winter*.log </pre> <p> There is also a related command called <kbd>TIMEOFFSET</kbd>. This tells analog how much to offset the time of the computer on which it is running (rather than the computer running the server), to get your local time. <hr> <a name="NOROBOTS">There is a command</a> called <kbd>NOROBOTS</kbd> which stops robots which obey the <a href="http://info.webcrawler.com/mak/projects/robots/exclusion.html#meta">robots META tag</a> from indexing your output page or following its links. Normally this is set to <kbd>ON</kbd> but you can specify <kbd>NOROBOTS OFF</kbd> if you don't mind robots finding your other pages this way. Note that you will stop far more robots if you put your stats page in your <kbd><a href="http://info.webcrawler.com/mak/projects/robots/exclusion.html#robotstxt">robots.txt</a></kbd> file; on the other hand, this file has to be kept up to date by the server administrator. <hr> <a name="IMAGEDIR">There are a few</a> more minor, although cosmetically important, commands affecting the output. First there's a command <kbd>IMAGEDIR</kbd> which tells analog where the various images used to make the report live. It could be a relative or an absolute URL: for example <pre> IMAGEDIR img/ # within the same directory as the output IMAGEDIR /img/ # off the root directory of your server </pre> <p> <a name="LOGO">There are three commands</a> which affect the top line of the output. First, the <kbd>LOGO</kbd> command allows you to replace the analog logo with another image (for example, your organisation's logo). You can say <pre> LOGO picture.gif # for this file LOGO /images/picture2.gif # a different file LOGO none # for no logo </pre> The logo is assumed to be inside the <kbd>IMAGEDIR</kbd> unless it starts with a slash, or contains <kbd>://</kbd> <p> <a name="HOSTNAME">Then</a> there are commands <kbd>HOSTNAME</kbd> and <kbd>HOSTURL</kbd> which affect the name and link at the end of the title line. For example, I might specify <pre> HOSTNAME "Stephen Turner" HOSTURL http://www.statslab.cam.ac.uk/~sret1/ </pre> to generate the title "Web Server Statistics for <a href="http://www.statslab.cam.ac.uk/~sret1/">Stephen Turner</a>". Again, you can use <kbd>none</kbd> as the <kbd>HOSTURL</kbd> to specify no link. Analog will normally translate characters in the hostname to HTML if necessary. So to include literal HTML, such as accented characters, in the output you need to precede them by a backslash, like this: <pre> HOSTNAME "M\üller & S\öhne" </pre> <hr> <a name="HEADERFILE">There are commands</a> called <kbd>HEADERFILE</kbd> and <kbd>FOOTERFILE</kbd>. These let you specify files to be inserted near the top and bottom of your output. You can specify <pre> HEADERFILE none </pre> to cancel a previously-specified header file. <hr> <a name="STYLESHEET">There is a command</a> called <kbd>STYLESHEET</kbd> to specify a style sheet for the output. This allows you to specify colours etc. (See <a href="http://www.w3.org/Style/css/">http://www.w3.org/Style/css/</a> for how to write a style sheet.) For example, <pre> STYLESHEET /housestyle.css STYLESHEET none # to cancel it </pre> <i>Hint: a common mistake in writing style sheets is to declare a font-family for the body, but then not put <pre> sections back into a monospaced font. This stops the columns lining up properly. Your style sheet should contain a line like the following:</i> <pre> PRE, TT, CODE, KBD, SAMP { font-family: monospace } </pre> <hr> <a name="SEPCHAR">There are three</a> related commands called <kbd>SEPCHAR</kbd>, <kbd>REPSEPCHAR</kbd> and <kbd>DECPOINT</kbd>. These specify single characters to be used as the thousands separator in numbers, the thousands separator within the columns in the reports, and the decimal point. For example, a French user might choose <pre> SEPCHAR " " REPSEPCHAR none DECPOINT , </pre> to make "three thousand and a quarter" look like "3 000,25" in text and "3000,25" in the reports. <hr> <a name="RAWBYTES">There is a command</a> called <kbd>RAWBYTES</kbd>. Specify <kbd>RAWBYTES ON</kbd> if you want the exact number of bytes to be listed in reports, or <kbd>RAWBYTES OFF</kbd> if you want the number of kilobytes or Megabytes as appropriate to be listed instead. <hr> <a name="PAGEWIDTH">Finally</a> there are commands called <kbd>HTMLPAGEWIDTH</kbd> and <kbd>ASCIIPAGEWIDTH</kbd> which specify the width of the page. Obviously, the former is used when the output style is HTML, and the latter when the output style is ASCII. The output is not guaranteed to fit in this width, but analog will take notice of it when choosing the width of the time graphs, when sorting the host report alphabetically, when drawing horizontal rules, and when writing some bits of text. <hr> There are now some sections about configuring the output of particular reports, under the following headings: <cite><a href="#timereps">Time reports</a></cite>, <cite><a href="#othreps">Other reports</a></cite> and <cite><a href="#hierreps">Hierarchical reports</a></cite>. <hr> <hr> <a name="timereps"><h2>Time reports</h2> </a> This section is about commands which control the appearance of the time reports. There are eight such reports, which show the pattern of usage over time. Six of them show the usage at specific times, whilst the Hourly Summary and the Daily Summary show the total (not average) activity at particular times of day and week over the whole time period of the report. <p> <a name="timeCOLS">Each time report</a> can contain columns listing the requests, requests for pages, and bytes transferred at that time, using the following code letters. <dl compact> <dt><kbd>R</kbd><dd>Number of requests <dt><kbd>r</kbd><dd>Percentage of the requests <dt><kbd>P</kbd><dd>Number of page requests <dt><kbd>p</kbd><dd>Percentage of the page requests <dt><kbd>B</kbd><dd>Number of bytes transferred <dt><kbd>b</kbd><dd>Percentage of the bytes </dl> Which columns appear in which reports is controlled by various <kbd>COLS</kbd> commands. For example, the command <pre> HOURCOLS Pb </pre> tells analog to include the number of page requests and percentage of the bytes, in that order, as the columns for the Hourly Summary. The other <kbd>COLS</kbd> commands are <kbd>MONTHCOLS</kbd>, <kbd>WEEKCOLS</kbd>, <kbd>DAYCOLS</kbd> (Daily Summary), <kbd>FULLDAYCOLS</kbd> (Daily Report), <kbd>FULLHOURCOLS</kbd> (Hourly Report), <kbd>QUARTERCOLS</kbd> and <kbd>FIVECOLS</kbd>. There is also a <kbd>TIMECOLS</kbd> command, which specifies that all the time reports are to have the specified columns. <hr> <a name="GRAPH">Similarly</a>, analog can plot the bar charts in the time reports according to the number of requests, number of page requests, or number of bytes. This is controlled by the <kbd>GRAPH</kbd> family of commands. So, for example, <pre> FULLDAYGRAPH P </pre> tells analog to plot the bar charts in the Daily Report by the number of page requests. This also controls how analog decides which is the busiest time period in the bottom line of the report. Using a lower case letter tells analog to plot the bar charts with ASCII characters instead of the normal red bars. (This produces shorter output, and it is how they appear anyway in <kbd>ASCII</kbd> output style, or when viewed with a non-graphical browser.) So, for example, <pre> FULLDAYGRAPH b </pre> would plot the Daily Report by bytes, without using the graphics. The other <kbd>GRAPH</kbd> commands are <kbd>MONTHGRAPH</kbd>, <kbd>WEEKGRAPH</kbd>, <kbd>DAYGRAPH</kbd>, <kbd>HOURGRAPH</kbd>, <kbd>FULLHOURGRAPH</kbd>, <kbd>QUARTERGRAPH</kbd> and <kbd>FIVEGRAPH</kbd>. There's also an <kbd>ALLGRAPH</kbd> command to set all of them simultaneously. <hr> <a name="BARSTYLE">There are various</a> possible graphics available for the graphs, controlled by the <kbd>BARSTYLE</kbd> command, as follows. (They will all look the same if you have a non-graphical browser.) <pre><tt> BARSTYLE a <img src="bara8.gif" alt="+++++++++++"> BARSTYLE b <img src="barb8.gif" alt="+++++++++++"> BARSTYLE c <img src="barc8.gif" alt="+++++++++++"> BARSTYLE d <img src="bard8.gif" alt="+++++++++++"> BARSTYLE e <img src="bare8.gif" alt="+++++++++++"> BARSTYLE f <img src="barf8.gif" alt="+++++++++++"> BARSTYLE g <img src="barg8.gif" alt="+++++++++++"> BARSTYLE h <img src="barh8.gif" alt="+++++++++++"> </tt></pre> The default style is <kbd>b</kbd>. <hr> <a name="BACK">You can plot the graphs</a> either forwards in time (starting from the earliest date) or backwards (starting from the latest date). Use commands like <pre> MONTHBACK ON # Monthly Report backwards WEEKBACK OFF # Weekly Report forwards </pre> The other <kbd>BACK</kbd> commands are <kbd>FULLDAYBACK</kbd>, <kbd>FULLHOURBACK</kbd>, <kbd>QUARTERBACK</kbd> and <kbd>FIVEBACK</kbd>. It tends to be confusing to mix directions (and analog will warn you if you attempt it) so usually you want to use the <kbd>ALLBACK</kbd> command which will set all of them at once. <hr> <a name="ROWS">For the more detailed time reports</a>, you usually only want to list the last few time periods. (Every five minutes for the last three years?? I think not.) So analog provides some <kbd>ROWS</kbd> commands to let you specify how many rows you want in the time reports. For example <pre> QUARTERROWS 96 # only the last day's worth MONTHROWS 0 # 0 means no restriction: show all time </pre> The other <kbd>ROWS</kbd> commands are <kbd>WEEKROWS</kbd>, <kbd>FULLDAYROWS</kbd>, <kbd>FULLHOURROWS</kbd> and <kbd>FIVEROWS</kbd>. Even if a <kbd>ROWS</kbd> command is given, the line at the bottom of the report will still show the busiest time period ever, not just the busiest one in that many rows. <hr> <a name="MARKCHAR">The character</a> which is used for plotting the graphs in ASCII style or on a non-graphical browser is specified by means of the <kbd>MARKCHAR</kbd> command. For example, <pre> MARKCHAR = </pre> tells analog to use the equals sign. <p> <a name="MINGRAPHWIDTH">There is a parameter</a> called <kbd>MINGRAPHWIDTH</kbd> which sets the minimum nominal size of the graphs. For example, if you set <pre> MINGRAPHWIDTH 10 </pre> then the graph will be allowed to be up to 10 characters wide, even if that would exceed the <kbd>PAGEWIDTH</kbd>. <p> <a name="WEEKBEGINSON">There is one more command</a> which affects the time reports. You can specify which day should be counted as the first day of the week. This affects the layout of the Daily Report, Daily Summary and Weekly Report. For example, our local student newspaper publishes a new edition on the web every Friday, so they like to specify <kbd> WEEKBEGINSON FRIDAY </kbd> for their reports. <p> In the next section, we'll look at commands relating to the <a href="#othreps">non-time reports</a>. <hr> <hr> <a name="othreps"><h2>Other reports</h2> </a> This section deals with the non-time reports. There are quite a lot of commands which control these reports, although we've seen some of them already. <p> <a name="othCOLS">First</a>, these reports have <kbd>COLS</kbd> commands, just like the time reports. (See the section on <cite><a href="#timeCOLS">Time reports</a></cite> for how to use these commands.) In the non-time reports, two additional columns are available, namely <kbd>D</kbd> for date of last access, and <kbd>N</kbd> for the number of the item in the list. So, for example, <pre> REQCOLS NRD </pre> counts the files in the Request Report, listing the number of requests for each and the time when each was last requested. The full list of <kbd>COLS</kbd> commands for non-time reports is <kbd>HOSTCOLS</kbd>, <kbd>ORGCOLS</kbd>, <kbd>DOMCOLS</kbd>, <kbd>REQCOLS</kbd>, <kbd>DIRCOLS</kbd>, <kbd>TYPECOLS</kbd>, <kbd>SIZECOLS</kbd>, <kbd>PROCTIMECOLS</kbd>, <kbd>REDIRCOLS</kbd>, <kbd>FAILCOLS</kbd>, <kbd>REFCOLS</kbd>, <kbd>REFSITECOLS</kbd>, <kbd>SEARCHQUERYCOLS</kbd>, <kbd>SEARCHWORDCOLS</kbd>, <kbd>REDIRREFCOLS</kbd>, <kbd>FAILREFCOLS</kbd>, <kbd>FULLBROWCOLS</kbd> (Browser Report), <kbd>BROWCOLS</kbd> (Browser Summary), <kbd>OSCOLS</kbd>, <kbd>VHOSTCOLS</kbd>, <kbd>USERCOLS</kbd>, <kbd>FAILUSERCOLS</kbd> and <kbd>STATUSCOLS</kbd>. Not every column is allowed in every report, but if you specify an illegal one, analog will warn you about it. <hr> <a name="SORTBY">Next</a> you need to know how use a <kbd>SORTBY</kbd> command to specify how the reports should be sorted. There are six possible ways of sorting reports: <kbd>REQUESTS</kbd>, <kbd>PAGES</kbd> (i.e., page requests), <kbd>BYTES</kbd>, <kbd>DATE</kbd>, <kbd>ALPHABETICAL</kbd> and <kbd>RANDOM</kbd> (no sorting, sometimes useful for speed in very long reports). For example, the command <pre> HOSTSORTBY ALPHABETICAL </pre> will sort the Host Report alphabetically. The other <kbd>SORTBY</kbd> commands are <kbd>ORGSORTBY</kbd>, <kbd>DOMSORTBY</kbd>, <kbd>REQSORTBY</kbd>, <kbd>DIRSORTBY</kbd>, <kbd>TYPESORTBY</kbd>, <kbd>REDIRSORTBY</kbd>, <kbd>FAILSORTBY</kbd>, <kbd>REFSORTBY</kbd>, <kbd>REFSITESORTBY</kbd>, <kbd>SEARCHQUERYSORTBY</kbd>, <kbd>SEARCHWORDSORTBY</kbd>, <kbd>REDIRREFSORTBY</kbd>, <kbd>FAILREFSORTBY</kbd>, <kbd>FULLBROWSORTBY</kbd>, <kbd>BROWSORTBY</kbd>, <kbd>OSSORTBY</kbd>, <kbd>VHOSTSORTBY</kbd>, <kbd>USERSORTBY</kbd>, <kbd>FAILUSERSORTBY</kbd> and <kbd>STATUSSORTBY</kbd>. Again, not every sort method is possible in every report, but you'll be warned if you choose an illegal one. <p> There is one known bug concerned with <kbd>SORTBY ALPHABETICAL</kbd>. The report is sorted before any <kbd><a href="#OUTPUTALIAS">OUTPUTALIAS</a></kbd> is applied. This means that if an <kbd>OUTPUTALIAS</kbd> has been specified for the report, then the report will not be sorted correctly. <hr> <a name="FLOOR">You can also</a> specify a <kbd>FLOOR</kbd> for most reports, saying how much activity an item needs before it is listed on the report. There are lots of possible ways of specifying floors, which I'll list here, using the <kbd>DOMFLOOR</kbd> (Domain Report <kbd>FLOOR</kbd>) command as an example. Essentially each one consists of a number indicating the level of the floor, followed by a letter indicating the floor criterion. <pre> DOMFLOOR 1000r # all domains with at least 1000 requests DOMFLOOR 1000p # at least 1000 requests for pages DOMFLOOR 1000000b # at least 1,000,000 bytes transferred DOMFLOOR 1Mb # at least 1 megabyte DOMFLOOR 0.5%r # 0.5% of the requests (ditto %p and %b) DOMFLOOR 0.5:r # 0.5% of the maximum number of requests # for any domain (ditto :p and :b) DOMFLOOR 970701d # last access since 1st July 1997 DOMFLOOR -00-01-00d # last access in last month (see # documentation on FROM and TO commands) DOMFLOOR -100r # domains with top 100 number of requests # (ditto -100p, -100b, -100d) </pre> The other <kbd>FLOOR</kbd> commands are <kbd>HOSTFLOOR</kbd>, <kbd>ORGFLOOR</kbd>, <kbd>REQFLOOR</kbd>, <kbd>DIRFLOOR</kbd>, <kbd>TYPEFLOOR</kbd>, <kbd>REDIRFLOOR</kbd>, <kbd>FAILFLOOR</kbd>, <kbd>REFFLOOR</kbd>, <kbd>REFSITEFLOOR</kbd>, <kbd>SEARCHQUERYFLOOR</kbd>, <kbd>SEARCHWORDFLOOR</kbd>, <kbd>REDIRREFFLOOR</kbd>, <kbd>FAILREFFLOOR</kbd>, <kbd>FULLBROWFLOOR</kbd>, <kbd>BROWFLOOR</kbd>, <kbd>OSFLOOR</kbd>, <kbd>VHOSTFLOOR</kbd>, <kbd>USERFLOOR</kbd>, <kbd>FAILUSERFLOOR</kbd> and <kbd>STATUSFLOOR</kbd>. Once again, not every floor method is legal for every report, but you'll be warned if you try and choose an illegal one. <hr> <a name="othclarg">I've already told you</a> about how to turn each report on and off from the command line using its <a href="#replist">code letter</a>. In fact, you can specify the <kbd>SORTBY</kbd> and the <kbd>FLOOR</kbd> in the same command. Take the example of the Referrer Report. If you follow the <kbd>+f</kbd> (to turn the report on) with a letter, it represents the sort method according to the following code: <dl compact> <dt><kbd>r</kbd><dd><kbd>REQUESTS</kbd> <dt><kbd>p</kbd><dd><kbd>PAGES</kbd> <dt><kbd>b</kbd><dd><kbd>BYTES</kbd> <dt><kbd>d</kbd><dd><kbd>DATE</kbd> <dt><kbd>a</kbd><dd><kbd>ALPHABETICAL</kbd> <dt><kbd>x</kbd><dd><kbd>RANDOM</kbd> </dl> You can then, or alternatively, use one of the above <kbd>FLOOR</kbd> formats to specify the floor. If you specify a <kbd>SORTBY</kbd>, you can also leave off the last letter of the floor, and analog will guess it according to the sort method: the floor will be by pages or bytes if that is the sort method, and otherwise by requests. Here are four examples: <dl compact> <dt><kbd>+fp</kbd><dd>means turn the referrer report on and sort it by page requests, but says nothing about the floor; <dt><kbd>+f100r</kbd><dd>means list all referrers with at least 100 requests, but says nothing about the sort method; <dt><kbd>+fb10000</kbd><dd>means list all referrers with at least 10,000 bytes, sorted by bytes; <dt><kbd>+fa-000101d</kbd><dd>means list all referrers with accesses this year, sorted alphabetically. </dl> <hr> We've already seen some other commands affecting what was listed in the non-time reports. The <a href="#outputexcludes">output <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd></a> commands specified lines to omit from each report, and the <kbd><a href="#OUTPUTALIAS">OUTPUTALIAS</a></kbd> commands specified some aliasing to do on the names before they were listed. There were also <a href="#LINKINCLUDE"><kbd>LINKINCLUDE</kbd> and <kbd>LINKEXCLUDE</kbd></a>, and <a href="#LINKINCLUDE"><kbd>REFLINKINCLUDE</kbd> and <kbd>REFLINKEXCLUDE</kbd></a> commands to control what was linked to in the Request Report and the three referrer reports. You might want to have another look at these paragraphs. <p> <a name="BASEURL">There's one other command</a> which affects the links in the Request Report. The command <kbd>BASEURL</kbd> prepends an additional string to the URLs in the target of the link. For example, after the command <pre> BASEURL http://www.statslab.cam.ac.uk </pre> <kbd>/~sret1/</kbd> will be linked to <kbd>http://www.statslab.cam.ac.uk/~sret1/</kbd>, not just to <kbd>/~sret1/</kbd>. This is very useful if you want to display the statistics on a different server from the server they refer to. If you want the file to be listed as <kbd>http://www.statslab.cam.ac.uk/~sret1/</kbd>, rather than just to be linked to that address, you need to use the second argument to the <kbd><a href="#secondarg">LOGFILE</a></kbd> command instead. <p> In the next section, we'll look at commands for generating <a href="#hierreps">hierarchical reports</a>, which are closely related to the commands in this section. <hr> <hr> <a name="hierreps"><h2>Hierarchical reports</h2> </a> Some of the non-time reports have a hierarchical (or tree) structure: so, for example, each domain in the domain report can have subdomains listed under it, which in turn can have sub-subdomains, and so on. This section describes commands for managing hierarchical reports. <p> First, you need to be able to control what gets listed in the reports. For this you need to use the <kbd>SUB</kbd> family of commands. So, for example, the command <kbd> SUBDIR /~sret1/* </kbd> would ensure that the Directory Report would not only contain an entry for the sum of my files, but also one for each of my subdirectories, something like this: <pre> 29,111: /~sret1/ 10,234: /~sret1/analog/ 5,179: /~sret1/backgammon/ 11,908: /~steve/ </pre> You can have more than one <kbd>*</kbd> in the command. For example <pre> SUBDOMAIN *.* </pre> would list the whole Domain Report two levels deep. <p> If you specify a <kbd>SUB</kbd> command, all the intermediate levels are included automatically. So, for example, after <pre> SUBDOMAIN statslab.cam.ac.uk </pre> <kbd>cam.ac.uk</kbd> and <kbd>ac.uk</kbd> will be included in the Domain Report too, and after <kbd>*.*.ac.uk</kbd>, <kbd>*.ac.uk</kbd> will be included. <p> Here are examples of the other four <kbd>SUB</kbd> commands: <pre> SUBTYPE *.gz # in the File Type Report SUBBROW */* # e.g. Mozilla/4 in the Browser Summary SUBBROW Mozilla/*.* # add minor version numbers for Mozilla REFDIR http://search.yahoo.com/* # Referring Site Report SUBORG *.aol.com # Organisation Report SUBORG *.*.com # Break down all .com's </pre> <p> The <kbd>SUBDOMAIN</kbd> report (but none of the others) can included a second argument describing the subdomain. For example <pre> SUBDOMAIN cam.ac.uk 'University of Cambridge' </pre> Then that subdomain will be listed with its translation in the Domain Report. You can also have numerical subdomains: e.g., <pre> SUBDOMAIN 131.111 'University of Cambridge' </pre> If you sort the subdomains alphabetically, the numerical ones will also be sorted alphabetically, not numerically. I don't think this will cause any problems. <p> One other use for the <kbd>SUBDIR</kbd> command is if you have used the second argument to the <kbd><a href="#secondarg">LOGFILE</a></kbd> command. Suppose you have translated files like <kbd>/index.html</kbd> into <kbd>http://www.mycompany.com/index.html</kbd>. Then the command <pre> SUBDIR http://*/* </pre> would be appropriate to make the directory report look right. <hr> <a name="SUBFLOOR">The</a> <a name="SUBSORTBY">lower</a> levels of each report have <kbd>FLOOR</kbd> and <kbd>SORTBY</kbd> commands which work exactly the same as those we have <a href="#SORTBY">already seen</a> for the top level. These commands are <kbd>SUBDIRFLOOR</kbd>, <kbd>SUBDOMFLOOR</kbd>, <kbd>SUBORGFLOOR</kbd>, <kbd>SUBTYPEFLOOR</kbd>, <kbd>SUBBROWFLOOR</kbd> and <kbd>REFDIRFLOOR</kbd>; and <kbd>SUBDIRSORTBY</kbd>, <kbd>SUBDOMSORTBY</kbd>, <kbd>SUBORGSORTBY</kbd>, <kbd>SUBTYPESORTBY</kbd>, <kbd>SUBBROWSORTBY</kbd> and <kbd>REFDIRSORTBY</kbd>. <p> A sub-item is listed in a hierarchical report only if it is above the sub-<kbd>FLOOR</kbd>, <i>and</i> it is included with a <kbd>SUB</kbd> command, <i>and</i> it is not excluded because of an <a href="#outputexcludes"><kbd>INCLUDE</kbd> or <kbd>EXCLUDE</kbd></a> command, <i>and</i> its immediate parent is listed. For example, specifying <pre> SUBDIR /*/*/ SUBDIRFLOOR -3r SUBDIRSORTBY REQUESTS </pre> would list the three subdirectories with most requests under each directory. <kbd>SUBDIRFLOOR 1:r</kbd> would have listed any subdirectory with at least 1% of the maximum number of requests of any <em>top level</em> directory. <p> <a name="ARGSFLOOR">The</a> <a name="ARGSSORTBY">three</a> file reports (Request Report, Redirection Report and Failure Report) and the three referrer reports (Referrer Report, Redirected Referrer Report and Failed Referrer Report) are not fully hierarchical, but they do list <a href="#args">search arguments</a> together under the file to which they refer (provided that the arguments have been read in: see the <kbd><a href="#ARGSINCLUDE">ARGSINCLUDE</a></kbd> command). So they have similar sub-<kbd>FLOOR</kbd> and sub-<kbd>SORTBY</kbd> commands, namely <kbd>REQARGSFLOOR</kbd>, <kbd>REDIRARGSFLOOR</kbd>, <kbd>FAILARGSFLOOR</kbd>, <kbd>REFARGSFLOOR</kbd>, <kbd>REDIRREFARGSFLOOR</kbd> and <kbd>FAILREFARGSFLOOR</kbd>; and <kbd>REQARGSSORTBY</kbd>, <kbd>REDIRARGSSORTBY</kbd>, <kbd>FAILARGSSORTBY</kbd>, <kbd>REFARGSSORTBY</kbd>, <kbd>REDIRREFARGSSORTBY</kbd> and <kbd>FAILREFARGSSORTBY</kbd>. The same applies to the Operating System Report with its subdivisions of operating systems: it has <kbd>SUBOSFLOOR</kbd> and <kbd>SUBOSSORTBY</kbd>. <hr> The lower levels of a hierarchical report temporarily interrupt the top level, and even though they are indented, this can sometimes make it look as if the report is out of order. If you have a lot of sub-items, for example in the Referrer Report if there are a lot of search arguments, then including the <a href="#othCOLS"><kbd>N</kbd> column</a> can help to make it clearer again. <hr> That concludes the description of all the output configuration commands. Now we move on to some other individual topics, starting with the <a href="#domfile">domains file</a>. <hr> <hr> <a name="domfile"><h2>The domains file</h2> </a> The domains file tells analog which country is represented by each domain. You can tell analog where to find your domains file with a command like <pre> DOMAINSFILE lang/mydomains.tab </pre> Normally you don't need this command, because if there is a domains file in your language, it should be selected automatically. But the <kbd>DOMAINSFILE</kbd> command can be useful if you want to use a domains file in a new language, for example. <p> If you haven't got a domains file, you can download one from <a href="http://www.statslab.cam.ac.uk/~sret1/analog/ukdom.tab">http://www.statslab.cam.ac.uk/~sret1/analog/ukdom.tab</a>. It should contain on each line a domain code, followed by a number, followed by its location, like this: <pre> ad 2 Andorra ae 3 United Arab Emirates [...] </pre> It does not need to be in alphabetical order, though humans may prefer it that way. Subdomains do not go in the domains file: you can list them in the Domain Report using the <kbd><a href="#hierreps">SUBDOMAIN</a></kbd> command. <hr> The number beside each domain represents how many levels deep an "organisation" is considered to be, for the purposes of the Organisation Report. For example, consider the hostname <kbd>www.sta.ad</kbd>. The organisation is <kbd>sta.ad</kbd>, at the second level, so Andorra has a 2 in the above list. But in the UAE, a host looks like <kbd>www.economy.gov.ae</kbd>. There is an extra level in the hierarchy, so the UAE has its organisations at level 3. <p> There are some problems with this. A few countries have organisations at both levels 2 and 3 (for example <kbd>asaspace.at</kbd> and <kbd>univie.ac.at</kbd>). In those cases I've favoured false negatives over false positives by using the bigger number. (Also there is a correction which will make most of them right again: the first component is always removed from a hostname of three or more components.) For other countries, I don't have enough information to tell what the level should be. I've just given those a 1. Do <a href="#mailing">let me know</a> if you have any more information, or corrections, for the numbers. <hr> If you want HTML special characters in the domains file, you have to precede them with a backslash, like this: <pre> am Arm\énie </pre> <p> Only domains which occur in the domains file will get their own line in the Domain Report: the rest are probably spurious, and will be accumulated together as "unknown domains". If you have <a href="#debugs">debugging</a> turned on, you can see which domains were unknown. <p> Lines starting with a hash (<kbd>#</kbd>) in the domains file are considered to be comments. <hr> <hr> <a name="compout"><h2>Computer-readable output style</h2> </a> This section describes the computer-readable output style. You can select this style by means of the command <pre> OUTPUT COMPUTER </pre> This style is designed to be easy to read into spreadsheets, or post-process with graphics creation tools, for example. <p> Each line in the output is separated into fields by means of a special string. You can specify this string by means of the <kbd>COMPSEP</kbd> command; for example <pre> COMPSEP , </pre> for CSV (comma separated value) format. Make sure not to use anything that might occur in the output: for example, a single or double space would not be suitable. <p> Each line in the preformatted output begins with a letter indicating which report the line is part of. (The code letters for the reports are listed in the section on <cite><a href="#replist">Configuring the Output</a></cite>.) After that, there follows a field indicating the remaining columns in the report (using the letters <kbd>RrPpBbD</kbd> as usual). Then there are the numerical data and then the name of the item. Times actually take up several fields: year, month, date, hour & minute, or as many of those as are necessary to identify the time. <p>The first line of most reports has <kbd>f</kbd> instead of the normal column letters, followed by the floor for the report, in the form it would be written for a <kbd><a href="#FLOOR">FLOOR</a></kbd> command, followed by the <kbd>SORTBY</kbd> using the code letters <dl compact> <dt><kbd>r</kbd><dd><kbd>REQUESTS</kbd> <dt><kbd>p</kbd><dd><kbd>PAGES</kbd> <dt><kbd>b</kbd><dd><kbd>BYTES</kbd> <dt><kbd>d</kbd><dd><kbd>DATE</kbd> <dt><kbd>a</kbd><dd><kbd>ALPHABETICAL</kbd> <dt><kbd>x</kbd><dd><kbd>RANDOM</kbd> </dl> <p> The general summary is a bit different. After an initial <kbd>x</kbd>, there is a two-character code saying what the line contains. The possible codes are <dl compact> <dt><kbd>VE</kbd><dd>Version of analog <dt><kbd>HN</kbd><dd><kbd>HOSTNAME</kbd> <dt><kbd>HU</kbd><dd><kbd>HOSTURL</kbd> <dt><kbd>PS</kbd><dd>Program start time <dt><kbd>FR</kbd><dd>Time of first request <dt><kbd>LR</kbd><dd>Time of last request <dt><kbd>E7</kbd><dd>Time last 7 days ends <dt><kbd>SR</kbd><dd>Total successful requests <dt><kbd>S7</kbd><dd>Total successful requests in last 7 days <dt><kbd>PR</kbd><dd>Total successful requests for pages <dt><kbd>P7</kbd><dd>Total successful requests for pages in last 7 days <dt><kbd>FL</kbd><dd>Total failed requests <dt><kbd>F7</kbd><dd>Total failed requests in last 7 days <dt><kbd>RR</kbd><dd>Total redirected requests <dt><kbd>R7</kbd><dd>Total redirected requests in last 7 days <dt><kbd>NC</kbd><dd>Logfile lines without status code <dt><kbd>C7</kbd><dd>Lines without status code in last 7 days <dt><kbd>NF</kbd><dd>Number of distinct files requested <dt><kbd>N7</kbd><dd>Number of distinct files requested in last 7 days <dt><kbd>NH</kbd><dd>Number of distinct hosts served <dt><kbd>H7</kbd><dd>Number of distinct hosts served in last 7 days <dt><kbd>CL</kbd><dd>Number of corrupt lines in the logfile <dt><kbd>UL</kbd><dd>Number of unwanted lines in the logfile <dt><kbd>BT</kbd><dd>Total number of bytes transferred <dt><kbd>B7</kbd><dd>Total number of bytes transferred in last 7 days </dl> <hr> <hr> <a name="cache"><h2>Cache files</h2> </a> Analog has the ability to archive <strong>some</strong> of the data in your logfile into a <i>cache file</i> so that the logfile can be thrown away without losing the most important data. <p> For most people, the cache file will not be needed: compressing the logfile using a standard compression utility such as gzip will be sufficient. Compressing a logfile is very efficient owing to the large number of repeated strings: I find about 12 times compression in practice. That in itself may solve your filespace problems, without needing to throw away any information. <p> The cache file is not the best format for post-processing the data or feeding it into a spreadsheet. For that you should use the <a href="#compout">computer readable output style</a>. <p> If you are going to use the cache file feature, it is very important that you understand what is and what is not recorded. It is <strong>not</strong> possible to reconstruct everything of interest in the logfile from the cache file. The cache file does contain information about the total number of requests for each host and each file, but not about, for example, which files were read by which hosts. (To do so would take up as much disk space as the compressed logfile.) So you cannot later look at only one file and see which hosts read that file. Similarly, you cannot later restrict the files or hosts by date, using <kbd>FROM</kbd> and <kbd>TO</kbd> commands. <p> In summary, you should do all the inclusions and exclusions you want when you create the cache file. If you want different sets of inclusions and exclusions, you should create several cache files from the same logfile. You cannot later apply extra inclusions and exclusions accurately. <p> A couple of other minor points: the pattern of failed requests and redirected requests over time is not recorded in the cache file. So although the total number will still be correct, the number in the last 7 days can be under-reported subsequently. And times are only recorded to five-minute resolution. <hr> You can create a cache file by setting the <kbd>CACHEOUTFILE</kbd> to be the file you want the cache to live in. Set <pre> CACHEOUTFILE none </pre> to turn it off again. You will still get the regular output as well as the cache output, unless you request <kbd><a href="#outstyle">OUTPUT NONE</a></kbd>. To avoid overwriting, you cannot set the <kbd>CACHEOUTFILE</kbd> to be a file which already exists. (Disclaimer: on some systems, race conditions may very occasionally thwart this check. Also on a few systems, making the file writeable but not readable will allow it to be overwritten). You can include the date in the name of the <kbd>CACHEOUTFILE</kbd> in the same way as described earlier for the <kbd><a href="#OUTFILE">OUTFILE</a></kbd>. <p> You can read in a previously-made cache file with the <kbd>CACHEFILE</kbd> command, or with the <kbd>+U</kbd> command line option. As with the <kbd><a href="#logfile">LOGFILE</a></kbd> command, you can use commas and wild cards to read in several cache files, and read compressed cache files using the <kbd>UNCOMPRESS</kbd> mechanism. Note that if you don't want to read a logfile as well as the cache file, you will have to explicitly set the <kbd>LOGFILE</kbd> to <kbd>none</kbd>. <p> When analog reads in a cache file, it will respect inclusions and exclusions as far as it can, but it does not apply any more aliases to the items. (This is to avoid double-aliasing.) So you must do any aliases you want at the time you create the cache file. Similarly, it does not obey the <kbd><a href="#TIMEOFFSET">LOGTIMEOFFSET</a></kbd> variable, to avoid double-offsetting, so any offset you want must be applied at cache-creation time too. <p> Sometimes you don't want to record all the types of item in the cache file. You might want to forget about which hosts had accessed your web site, for example, and only remember how many times each file was requested. You can choose not to include one type of item in the cache file by setting its <kbd><a href="#lowmem">LOWMEM</a></kbd> to 3; for example, specify <pre> HOSTLOWMEM 3 </pre> to exclude hosts from the cache file. Because this is a serious step, analog will produce a warning if you do this. You can even set all six <kbd>LOWMEM</kbd>s to 3 if you just want to remember the pattern of requests over time, not even which files were requested. <hr> When using the cache files, you have to be careful to store separate data in each cache file. So you shouldn't use an old cache file to make a new cache file, and then analyse both cache files together. And you shouldn't use the same logfile to make two different cache files, and then analyse both cache files together. To avoid losing entries or double counting them, I suggest you follow the following procedure. <ol> <li>Archive the old logfile, and restart the server with a fresh logfile. (See your server documentation for how to do this.) <li>Make both a cache file and an ordinary report from the old logfile. <li>Make a test report from the cache file and compare it against the report from the logfile to check it works. (This step really is worth doing!) <li>Make the main report from all your cache files, old and new. </ol> Now you can throw away the old logfile, if you've really understood what data you're losing by doing so. (But please remember that I can take no responsibility if something goes wrong: see the <a href="Licence.txt">licence</a>.) <p> I prefer to make a separate cache file from each logfile, in case something goes wrong with one of them, rather than a single cache file combining several logfiles, or a single cache file combining an old cache file and a logfile. <hr> <hr> <a name="dns"><h2>DNS lookups</h2> </a> Sometimes a logfile contains numerical IP addresses - like 131.111.20.59 - for the computers that have visited you, instead of names like lion.statslab.cam.ac.uk. This section describes how you can get analog to do so-called <i>DNS lookups</i> to translate these numbers into names. This relies on you having a suitably configured system: DNS lookups are not possible on some systems. <p> Unfortunately DNS lookups are typically very slow, because your computer has to ask across the network to find out the names of the hosts. For this reason, analog saves the addresses it has looked up in a file, so that you don't have to look them up again next time. (Even so, you may find the DNS lookups too slow to be usable.) The file is specified by a command like <pre> DNSFILE dnsfile.txt </pre> You will still need to use one of the commands in the next paragraph in order to actually use the file. <p> There are four possible levels of DNS activity. If you specify <kbd>DNS NONE</kbd>, no numerical addresses will be resolved. If you specify <kbd>DNS READ</kbd>, then analog will read the DNS file for old lookups, but no new lookups will take place. This mode is suitable if you are running analog while not connected to the internet. The third level is <kbd>DNS WRITE</kbd>. This reads the old file, looks up new addresses, and adds them to the file. (The first time you use <kbd>DNS WRITE</kbd>, you will get a missing-file warning as it tries to read the old file, but it will exist the next time.) The final level is <kbd>DNS LOOKUP</kbd>. This reads the old file and looks up new addresses, but doesn't add the new addresses to the file, so that they will not be remembered for next time. This is not normally a level that the user wants to specify, but analog will switch to this the behaviour if <kbd>DNS WRITE</kbd> fails for some reason. <p> If you are using a <kbd><a href="#include">HOSTEXCLUDE</a></kbd> command, you need to exclude the numerical IP address if it can't be resolved, or the name if it can. In other words, exclude whatever the host is known as in the report. <hr> If two copies of analog were allowed to write to the DNS file at the same time, the file could become corrupted. So when analog is running in <kbd>DNS WRITE</kbd> mode, it creates a <em>lock file</em> which tells other copies of analog to back off to <kbd>DNS LOOKUP</kbd>. You can change the location of that file with the command <pre> DNSLOCKFILE filename </pre> Of course you should make sure that all copies of analog use the same lock file, at least if they have the same DNS file! If analog crashes, it may not clear up the lock file, so in that case you may have to delete it yourself. (Disclaimer: on some systems, race conditions may occasionally thwart this mechanism, but this is very unlikely.) <p> Analog never deletes anything from the DNS file: this means that the DNS file will grow, and can become quite large. You should delete the top of it every so often. <p> There are two parameters which say how long to trust old lookups for. If you set <pre> DNSGOODHOURS 672 </pre> for example, then successful lookups will be checked again after 672 hours (4 weeks). You can also set the <kbd>DNSBADHOURS</kbd> similarly, to check failed lookups again after a certain time. <p> Finally, there is a <a href="#debugs">debugging</a> command, <kbd>DEBUG +D</kbd> to show all the DNS lookups that analog is making. <hr> There are lots of tools to help with the DNS lookups on the <a href="#helpers">helper applications</a> page. <hr> Normally you need never write a DNS file: you should rely on analog to do it for you. But in case you need to know, the format of the file is <pre> timestamp IP_address name </pre> where the timestamp is the number of minutes since the beginning of 1970, GMT (i.e., "Unix time" divided by 60), and the name is just <kbd>*</kbd> if the address couldn't be resolved. <hr> <hr> <a name="lowmem"><h2>Coping with low memory</h2> </a> This section describes how to run analog with lower amounts of memory. For a normal logfile this will make analog run a bit slower. But if your computer is running out of memory when running analog, it will go very slowly indeed: so for large logfiles, this can make analog run much faster, or even make an analysis possible that wouldn't otherwise be possible. <p> Recall what happens to an item when it has been read in. First it is <a href="#alias">aliased</a>. Secondly, it is checked to see whether it is <a href="#include">included or excluded</a>. Then finally, if all the items are wanted, one request is added to its score. <p> Normally the name of the item is saved before the aliasing takes place. This avoids analog having to do the aliasing again next time the same item is encountered. But this can take up more memory than necessary. So there is a family of <kbd>LOWMEM</kbd> commands provided, which tell analog to record the name at a later stage, or even not at all. If you use these commands, analog will have to do a bit more work than normal, but it will use less memory. On most sites, the hosts take up most of the memory, so I'll use the <kbd>HOSTLOWMEM</kbd> command as an example. <p> The command <pre> HOSTLOWMEM 0 </pre> represents the normal case, when the hostname is recorded before being aliased. If you specify <pre> HOSTLOWMEM 1 </pre> instead, then the hostname is not recorded until after the aliasing. If you specify <pre> HOSTLOWMEM 2 </pre> then the name is not recorded until after the inclusion and exclusion lookup has been done as well. And finally, if you give the command <pre> HOSTLOWMEM 3 </pre> then the hostname is not saved at all, and the Host Report will not be constructed, even if you've asked for it. (The Domain Report can still be constructed though.) The analogous commands for the other items are <kbd>FILELOWMEM</kbd>, <kbd>BROWLOWMEM</kbd>, <kbd>REFLOWMEM</kbd>, <kbd>USERLOWMEM</kbd> and <kbd>VHOSTLOWMEM</kbd>. <hr> So what should you do if analog runs out of memory? First, look in your logfile to see which items are taking up all the memory. If you have lots of different filenames, ones you generate on the fly for example, you would want to use the <kbd>FILELOWMEM</kbd> commands. Maybe you could combine all the similar filenames into one with a <kbd>FILEALIAS</kbd> command, and use <kbd>FILELOWMEM 1</kbd>. (If you have lots of different filenames caused by different search arguments, then using <kbd><a href="#ARGSEXCLUDE">ARGSEXCLUDE</a></kbd> might solve your problem without any need to use <kbd>LOWMEM</kbd> at all). But for most users, it is the hostnames which cause the problem. If you only want to analyse requests from certain hosts, then you could use <kbd>HOSTLOWMEM 2</kbd> to exclude the others before recording those that are left. If you don't want to exclude any hosts, and you haven't got enough memory to record all the different hostnames, then <kbd>HOSTLOWMEM 3</kbd> would be appropriate. <hr> <hr> <a name="debug"><h2>Debugging</h2> </a> This section lists commands to help you debug analog, if you think it's going wrong. There's another section later which lists all the <a href="#errors">errors and warnings</a> which analog can generate, and what they all mean, and another section which tells you <a href="#mailing">how to report bugs</a>. <p> First, remember the option we mentioned before, to list the current settings of all of analog's variables. To get this, just put <kbd>-settings</kbd> on the command line, or <kbd>SETTINGS ON</kbd> in one of your configuration files, along with your other commands. Then analog will produce the list of settings instead of running in the normal way. <hr> <a name="debugs">There are commands</a> which control how much debugging information and warning information analog gives out while it is running. By default you get all the warnings and no debugging, but you can change this by means of the commands <kbd>DEBUG</kbd> and <kbd>WARNINGS</kbd>. If you say <pre> DEBUG ON </pre> you get all the debugging. (And <kbd>DEBUG OFF</kbd> turns it all off.) You can also get just certain categories of debugging. The categories are <dl compact> <dt><kbd>C</kbd><dd>list all corrupt logfile lines <dt><kbd>D</kbd><dd>information about DNS lookups <dt><kbd>F</kbd><dd>information about file opening and closing <dt><kbd>S</kbd><dd>summary information about each logfile when it's closed <dt><kbd>U</kbd><dd>list unknown domains <dt><kbd>V</kbd><dd>list hosts without a domain (i.e., without a dot) </dl> So, for example, the command <pre> DEBUG FS </pre> would give you information about file opening and closing, and what was in each logfile, but none of the other sorts of debugging. Each line of debugging information is prepended with its code letter. You can also specify <pre> DEBUG +CD </pre> to add <kbd>C</kbd> and <kbd>D</kbd> category debugging to whatever you've already got, and <pre> DEBUG -CD </pre> to remove those two categories. <p> There is also a command line abbreviation for this command. Use <kbd>+V</kbd> (for <kbd>ON</kbd>), <kbd>-V</kbd> (for <kbd>OFF</kbd>), <kbd>+VFS</kbd> (to select exactly options <kbd>FS</kbd>), <kbd>+V+FS</kbd> (to add those options), and <kbd>+V-FS</kbd> (to remove them). <p> The <kbd>C</kbd> messages actually come on two lines. The first line gives the logfile line which was corrupt. The second line indicates where analog first noticed a problem. (This is usually, but not always, close to where the problem actually was!) In fact, each "line" of the message may spread over more than one line on your screen, and you have to be careful to take that into account when trying to find out where the logfile line was corrupt. <hr> <a name="WARNINGS">The <kbd>WARNINGS</kbd></a> command acts similarly. As well as <kbd>WARNINGS ON</kbd> and <kbd>WARNINGS OFF</kbd>, there are warnings in the following categories. <dl compact> <dt><kbd>C</kbd><dd>invalid configuration specified <dt><kbd>D</kbd><dd>dubious configuration specified <dt><kbd>E</kbd><dd><kbd>ERRFILE</kbd> command used (see below) <dt><kbd>F</kbd><dd>files missing or corrupt <dt><kbd>L</kbd><dd>apparent problems in logfiles <dt><kbd>M</kbd><dd>possibly problems in logfiles <dt><kbd>R</kbd><dd>turning off empty reports </dl> Warnings range from the probably harmless to the usually serious. See the section on <cite><a href="#warns">Errors and warnings</a></cite> for more details about the various categories. Again, warnings are printed with their code letters. <p> There is also a command line version of the <kbd>WARNINGS</kbd> command, looking like <kbd>+q</kbd>, <kbd>-q</kbd>, <kbd>+q<options></kbd>, <kbd>+q+<options></kbd> or <kbd>+q-<options></kbd>. <hr> <a name="PROGRESSFREQ">There is one more command</a> which is useful when trying to debug analog. If you give the command <pre> PROGRESSFREQ 20000 # say </pre> then analog will produce a little message after every 20,000 lines it reads from the logfile. This is useful to determine whether the program has really stopped or (as is more likely) is just being slow for some reason (such as using DNS lookups). <hr> <a name="ERRFILE">To start with</a>, all these messages go to <i>standard error</i>, which is normally just the screen. But you can change that by means of a command like <pre> ERRFILE newfile </pre> If you do this, analog will warn you that it's redirecting the messages, just so that you don't miss any. To change back to standard error, use <pre> ERRFILE stderr </pre> The <kbd>ERRFILE</kbd> command will erase any previous contents of that file. (So don't use the same <kbd>ERRFILE</kbd> command twice, or you may lose messages!) <hr> <a name="ERRLINELENGTH">There is a command</a> called <kbd>ERRLINELENGTH</kbd> to tell analog the width of screen you want these messages to fit in. As a special case, <pre> ERRLINELENGTH 0 </pre> specifies an unlimited screen width. <hr> There is just one more section about analog's configuration commands and command line arguments, but it's a rather long one, on the <a href="#form">form interface</a>. (This is a way of running analog by selecting options from a web page.) You might prefer to go straight onto the section on <cite><a href="#meaning">What the results mean</a></cite>. <hr> <hr> <a name="form"><h2>Form interface and CGI program</h2> </a> The form interface provides an HTML front end to analog, on Unix or Windows platforms (and maybe others). That means that users can select options from a web page, instead of having to create a configuration file. <p> <strong>Important:</strong> For <strong><a href="#notcgi">security reasons</a></strong>, you must not attempt to run analog itself as a CGI program, or even leave it in the directory or folder with your web files or CGI programs. When the form interface runs analog for you, it checks that analog isn't given any dangerous options. Without this check, your system could be vulnerable to attack. <p> Please don't try and set up the form until analog has been set up and is running properly on its own. It just adds another level of complexity to troubleshoot. And unlike analog itself, the form interface will <em>not</em> run "out of the box". You have to read this section to find out how to set it up. <p> The form interface is suitable for ordinary users to use, but it <b>needs to be set up by a system administrator</b> or other expert. In order to set it up, you have to be running a web server. You need to know what CGI programs are, where they live on your server, and how to set up their permissions properly. You also need to know how to write HTML forms. I shall assume this level of background knowledge for the rest of this section. And you have to be running Perl 5.001 or later: see <cite><a href="#formtech">Technical details</a></cite> below for other system requirements. (Actually, if you're on Windows and don't have Perl, you can download an executable version of the form interface from the <a href="#helpers">helper applications page</a>.) <p> <strong>Warning:</strong> CGI programs can contain security loopholes which allow an unscrupulous user to harm your system. (If you don't know about this, you shouldn't be running CGI programs at all. Read and understand the <a href="http://www.w3.org/Security/Faq/">World Wide Web Security FAQ</a> and the <a href="http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt">CGI Security FAQ</a> first.) I have tried to make this form interface safe, but I cannot guarantee it. Even the most carefully-designed CGI programs can accidentally have serious security bugs. And I take no responsibility if anything goes wrong: you use it at your own risk. (See the <a href="Licence.txt">licence</a>.) Furthermore, you should be aware that unless you take special measures like password protection or limiting <kbd>anlgform.pl</kbd> to specific hostnames, setting up the form interface implies making analog executable, and your logfiles analysable, by anyone on the internet. There are more <a href="#security">notes on security design</a> in this program towards the end of this section. <p> The form interface consists of two parts: a form (called <kbd>anlgform.html</kbd>) to choose the options, and a cgi program (called <kbd>anlgform.pl</kbd>) to pass them to the analog program. Both <kbd>anlgform.html</kbd> and <kbd>anlgform.pl</kbd> <b>must</b> be configured to your system before they will work at all. There are instructions at the top of both files explaining how to do this. <p> The form which is distributed with the program should only be regarded as an example form. You can find forms in languages other than English in the <kbd>lang</kbd> directory. Or you can write your own if you prefer. In fact you don't actually need the form at all: if you want just to create a link to the cgi program, with the arguments passed after a question mark in the URL in the usual way, then that's fine. <hr> Almost every analog configuration command can be specified on the form, just by including a form element with that name on the form. So, for example, if you wanted to add a field for users to choose a logfile, you could write <pre> Logfile name: <input type=text name="LOGFILE"> </pre> or maybe something like <pre> <select name=LOGFILE size=1> <option value="/var/log/apache/fred"> Fred's logfile <option value="/var/log/apache/jane"> Jane's logfile </select> </pre> <p> There are a few commands which you can't specify on the form for security or performance reasons. The full list is <kbd>*LOGFORMAT</kbd>, <kbd>LANGFILE</kbd>, <kbd>HEADERFILE</kbd>, <kbd>FOOTERFILE</kbd>, <kbd>UNCOMPRESS</kbd>, <kbd>OUTFILE</kbd>, <kbd>CACHEOUTFILE</kbd>, <kbd>ERRFILE</kbd>, <kbd>DNS</kbd> and <kbd>SETTINGS</kbd>; and the person setting up the form can add more. There are also certain arguments you can't give to commands: the most important is that you can't include the wildcard <kbd>*</kbd> in the <kbd>LOGFILE</kbd>. See the <a href="#security">security notes</a> below for the reasons for these exclusions, and for some more commands you might want to add to the forbidden list. <hr> Some commands are most conveniently specified in two halves. First, there are commands which take two arguments (for example <a href="#alias"><kbd>ALIAS</kbd>es</a>). You can cope with these by sending two commands from the form, called <kbd>COMMAND1</kbd> and <kbd>COMMAND2</kbd>. For example, <pre> Alias this file: <input type=text name="FILEALIAS1"> To this one: <input type=text name="FILEALIAS2"> </pre> You can only specify one such pair this way; so there's no way to specify several of the same <kbd>ALIAS</kbd>, for example. <p> Then there are <a href="#FLOOR"><kbd>FLOOR</kbd></a> commands. To avoid users of the form having to know the syntax of these commands, you can if you want specify them in two halves, <kbd>FLOORA</kbd> and <kbd>FLOORB</kbd>, and they will be stuck together. For example, the form distributed with the program specifies <pre> <br>Include all domains with at least <input type=TEXT name="DOMFLOORA" maxlength=6 size=6> <select name="DOMFLOORB"> <option value=r>requests <option value=p>requests for pages <option value=b selected>bytes </select> </pre> If <kbd>DOMFLOORA</kbd> contains <kbd>5%</kbd> and <kbd>DOMFLOORB</kbd> contains <kbd>r</kbd>, then <kbd>DOMFLOOR 5%r</kbd> will be sent to the program. (Or <kbd>DOMFLOORA=5</kbd> and <kbd>DOMFLOORB=%r</kbd> would work too, if you chose to present the form that way.) <hr> <a name="formqv">There are a couple</a> of extra non-analog commands which can be sent from the form. First, if the option <kbd>qv=1</kbd> is set, then analog is not run, but a list of the configuration commands which would have been sent to analog is printed instead. This is useful for checking that the CGI program is working properly. It can also allow users to produce a configuration file from form settings. <p> Secondly, you can specify other configuration files to be included at specific times. When analog is called by the CGI program, it first processes the <a href="#specialcfgs">default configuration file</a> as usual. Then it processes any configuration file specified by an option with name <kbd>cg</kbd>. Then it processes all the other commands which the CGI program specifies. After that, it processes any configuration file specified by an option with name <kbd>cm</kbd>. Finally, it processes the <a href="#specialcfgs">mandatory configuration file</a> as usual. (You may therefore want two copies of analog, one for form use and one for non-form use, with different configuration files compiled in.) Note that the commands in the default and mandatory configuration files will contribute to the configuration: some of them may even override options specified on the form. For example, if the default configuration file contains an <kbd><a href="#include">INCLUDE</a></kbd> command, this may cause <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands specified on the form to behave unexpectedly. <hr> <kbd>anlgform.pl</kbd> usually sends the commands to analog in the order in which it received them, which should be the same as the order they occurred in the form. But there are some exceptions. First, all commands of the same name are grouped together. So an interleaved sequence of <kbd>INCLUDE</kbd>s and <kbd>EXCLUDE</kbd>s won't work, for example. Secondly, even though the names of commands are case-insensitive, commands of the same name but in different cases may come in the wrong order. Keep them in the same case! Thirdly, <kbd>WARNINGS</kbd> and <kbd>LOGTIMEOFFSET</kbd> are sent first (and thus the <kbd>LOGTIMEOFFSET</kbd> applies to any logfiles specified on the form). <p> <a name="formalways">There are</a> a couple of commands which the form always sets. These may override what you have set elsewhere. First, it sets either <kbd>DNS READ</kbd> (if a <kbd>DNSFILE</kbd> is set on the form) or <kbd>DNS NONE</kbd> (otherwise). You can override this behaviour in the mandatory configuration file, but you are likely to run into timeout problems if you do. Secondly, it always sets <kbd>WARNINGS FL</kbd>, so that the less important warnings don't fill up your server's error log. You can override this by sending an explicit <kbd>WARNINGS</kbd> command from the form. <p> <a name="formuncompress">There is one small point</a> about compressed logfiles. For security reasons, when using the form interface you need to specify the full pathname to the uncompression command in the <kbd><a href="#UNCOMPRESS">UNCOMPRESS</a></kbd> command in your configuration file. <hr><h3><a name="trouble">Troubleshooting</a></h3> Here is what to do if you are having problems setting up the form interface. <p> First, you can run <kbd>anlgform.pl</kbd> from the (DOS or Unix) command line. This is good enough to debug most problems. You can specify options in pairs like this: <pre> anlgform.pl qv=1 LOGFILE=/some/log REQINCLUDE=pages </pre> If you include <kbd>qv=1</kbd> in the argument list as above, you will see what <kbd>anlgform.pl</kbd> is trying to send to analog. If you don't include <kbd>qv=1</kbd>, <kbd>anlgform.pl</kbd> will try and run analog. <p>If it still doesn't work, check the following points: <ol> <li>Have you edited <kbd>anlgform.pl</kbd> and <kbd>anlgform.html</kbd> as instructed at the top of those files? <li>Do other CGI programs work on your server? Is <kbd>anlgform.pl</kbd> in the right place to be recognised as a CGI program by the server? <li>Look in the server's error log for clues. <li>Are all relevant files (analog itself, logfiles, configuration files, auxiliary files such as domain files...) executable/readable by your web server? <li>If some form options don't seem to take effect, then check whether they are being overridden by a command in a configuration file. <li>If you get a long wait, then no data returned, the server is probably timing out the request before analog has finished. The remedy is to increase the timeout interval. <li>As explained <a href="#formalways">above</a>, the form always sets <kbd>DNS READ</kbd> or <kbd>DNS NONE</kbd>, and <kbd>WARNINGS FL</kbd>, overriding your default configuration file. <li>Again as explained <a href="#formuncompress">above</a>, uncompressing of compressed logfiles doesn't work unless you use the full pathname in the <kbd>UNCOMPRESS</kbd> command. </ol> <hr><h3><a name="security">Security notes</a></h3> As I said above, CGI programs can often contain security loopholes. Although I <a href="Licence.txt">don't guarantee</a> that the form interface is safe, I have done my best to make it so. Here I shall explain my design decisions. Comments on them are of course welcome: if they need to remain confidential, you can e-mail me privately at <kbd><a href="mailto:analog-author@lists.isite.net">analog-author@lists.isite.net</a></kbd>. <p> First, you should think about who can run the form interface. Unless you take special measures like password protection or limiting <kbd>anlgform.pl</kbd> to specific hostnames, adding the form interface to your site implies making analog executable, and your logfiles analysable, by anyone on the internet. There are obvious concerns both about privacy and about the load on your system. <p> Certain commands are ignored by <kbd>anlgform.pl</kbd> and not passed to analog. The list of them can be found at the top of <kbd>anlgform.pl</kbd>. Here are the reasons for them. <kbd>HEADERFILE</kbd> and <kbd>FOOTERFILE</kbd> would place any file on your system within the output. The <kbd>*LOGFORMAT</kbd> commands would also allow any file to be read, because someone could designate each line to be a single filename and then just list the filenames. <kbd>OUTFILE</kbd>, <kbd>CACHEOUTFILE</kbd> and <kbd>ERRFILE</kbd> would allow people to write to your filespace; <kbd>ERRFILE</kbd> would also divert errors away from your error log. <kbd>UNCOMPRESS</kbd> would allow a user to execute any command. <kbd>DNS</kbd> is forbidden because setting it higher than <kbd>READ</kbd> would normally cause the process to time out. <p> None of the above should be deleted (unless you are really, really sure that it's completely impossible for anyone other than yourself to run <kbd>anlgform.pl</kbd>). There are two other commands which are forbidden by default but which you could consider removing from the forbidden list. <kbd>SETTINGS</kbd> is included because it will give away the locations of some files on your system. But it is useful for diagnostic purposes, and you could consider removing it temporarily if you have trouble setting up the form. The other command which is included is <kbd>LANGFILE</kbd>, although I consider it to be a lower risk. It is included because it is theoretically possible that another file could be exactly the right number of lines long to be accepted as a language file, and then parts of it would get into the output. But it would have to be exactly the right length first. If that's a risk you're prepared to take, you can remove <kbd>LANGFILE</kbd> from the list. <p> There are other commands which you might consider adding to the list. For example, it is theoretically possible (though rather unlikely), that another file on your system could conform sufficiently closely to one of the predefined log formats that analog could be persuaded to analyse it and so reveal some of its contents. If you're worried about this, or even if you want to force only one particular logfile to be analysed from the form, you can add the <kbd>LOGFILE</kbd> command to the list of forbidden commands. And you could add <kbd>DOMAINSFILE</kbd> for similar reasons. <p> You can of course add any command you like to the list. For example, a user can use any configuration file on your system unless you add all of <kbd>CONFIGFILE</kbd>, <kbd>CM</kbd> and <kbd>CG</kbd>. Or if you wanted to stop a user having control of which warnings were written to the error log, you could add <kbd>WARNINGS</kbd>. <hr> For those who know about CGI security issues, here are some more technical comments on my design. <kbd>anlgform.pl</kbd> sets the <kbd>$PATH</kbd> environment variable to be empty. It opens <kbd>analog</kbd> as a pipe in order to pass arguments into analog's standard input. User-specified data is not used for the <kbd>open()</kbd> function, only passed down the pipe. <kbd>anlgform.pl</kbd> is run with the <kbd>-T</kbd> flag on Unix. (Does anyone know how to get this working under Windows?) <p> The arguments to <kbd>LOGFILE</kbd> and <kbd>CACHEFILE</kbd> commands are checked for containing only certain allowed characters (specifically, letters, digits, <kbd>/\.:_</kbd> space, and <kbd>-</kbd> between two {letter, digit, underscore}'s). This is because they could match an <kbd>UNCOMPRESS</kbd> command and thus be passed to the shell when the uncompress command is <kbd>popen()</kbd>'ed. <p> Apart from that, command names are checked for containing only letters and the digits 1 and 2; and the arguments to commands are checked for not containing control characters (actually characters 0-32 and 127-159; in particular newline characters are prohibited). The length of the commands isn't checked by <kbd>anlgform.pl</kbd>, but buffer overflow shouldn't be an issue as configuration commands are checked for length by analog. <p> <a name="notcgi">By the way</a>, the reason that I advise that analog itself shouldn't be used as a CGI program is that some servers, notably Microsoft IIS, allow users to pass command line arguments into a CGI program. And even if the program doesn't return the proper CGI headers, the output can be sent back to the user. This means that all the above checking of arguments is then thwarted. Of course, on servers on which you can't pass command line arguments to a CGI program, there are not the same security concerns, but then analog isn't very useful as a CGI program because if you can't pass any arguments, you can only get the default output. <hr><h3><a name="formtech">Technical details</a></h3> You need to be running Perl 5.001 or later (unless you're on Windows and download the executable version of the form interface from the <a href="#helpers">helper applications page</a>). You can get the latest version of Perl free from <a href="http://www.perl.org">www.perl.org</a>. You also need the module <kbd><a href="http://www.cpan.org/modules/by-module/CGI/">CGI.pm</a></kbd>, but this should have come with Perl anyway. <p> On Windows, you have to associate the <kbd>.pl</kbd> extension with the Perl executable so that Perl scripts are executed by Perl. <p> <kbd>anlgform.pl</kbd> will understand the <kbd>GET</kbd> or <kbd>POST</kbd> methods of form submission. The <a href="http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.1">HTML spec</a> says that <kbd>GET</kbd> should be used when, as in this case, running the program has no side effects. However, section 15.1.3 of the <a href="ftp://ftp.isi.edu/in-notes/rfc2616.txt">HTTP spec</a> says that <kbd>POST</kbd> should be used if some of the options being passed might be confidential. Also, very long URLs, formed by specifying lots of options, can cause trouble to some older servers. So <kbd>anlgform.html</kbd> uses the <kbd>POST</kbd> method by default. However, the <kbd>GET</kbd> method will also work. For example, you could make a normal link to <kbd>anlgform.pl</kbd> with options specified after a question mark in the usual <kbd>GET</kbd> way. <hr> <hr> <a name="meaning"><h2>What the results mean</h2> </a> This section of the Readme is about understanding the results analog produces. It's divided into three subsections. <ul> <li><cite><a href="#webworks">How the web works</a></cite>. This section discusses what happens when somebody connects to your web site, and what you can and can't find out about them. If you think that you can get statistics on how many people have visited your web site (or want to know why you can't), then this section is for you. <li><cite><a href="#reports">Analog's reports</a></cite>. This section gives a summary of analog's reports, what they contain, and which commands influence each one. <li><cite><a href="#defns">Analog's definitions</a></cite>. This section gives precise details on all of analog's terminology, exactly what is counted in each report, and so on. </ul> <hr> <hr> <a name="webworks"><h2>How the web works</h2> </a> This section is about what happens when somebody connects to your web site, and what statistics you can and can't calculate. There is a lot of confusion about this. It's not helped by statistics programs which claim to calculate things which cannot really be calculated, only estimated. The simple fact is that certain data which we would like to know and which we expect to know are simple not available. And the estimates used by other programs are not just a bit off, but can be very, very wrong. For example (you'll see why below), <em>if your home page has 10 graphics on, and an AOL user visits it, most programs will count that as 11 different visitors!</em> <p> This section is fairly long, but it's worth reading carefully. If you understand the basics of how the web works, you will understand what your web statistics are really telling you. <hr> <b>1. The basic model.</b> Let's suppose I visit your web site. I follow a link from somewhere else to your front page, read some pages, and then follow one of your links out of your site. <p> So, what do you know about it? First, I make one request for your front page. You know the date and time of the request and which page I asked for (of course), and the internet address of my computer (my <i>host</i>). I also usually tell you which page referred me to your site, and the make and model of my browser. I do not tell you my username or my e-mail address. <p> Next, I look at the page (or rather my browser does) to see if it's got any graphics on it. If so, and if I've got image loading turned on in my browser, I make a separate connection to retrieve each of these graphics. I never log into your site: I just make a sequence of requests, one for each new file I want to download. The referring page for each of these graphics is your front page. Maybe there are 10 graphics on your front page. Then so far I've made 11 requests to your server. <p> After that, I go and visit some of your other pages, making a new request for each page and graphic that I want. Finally, I follow a link out of your site. You never know about that at all. I just connect to the next site without telling you. <hr> <b>2. Caches.</b> It's not always quite as simple as that. One major problem is cacheing. There are two major types of cacheing. First, my browser automatically caches files when I download them. This means that if I visit them again, the next day say, I don't need to download the whole page again. Depending on the settings on my browser, I might check with you that the page hasn't changed: in that case, you do know about it, and analog will count it as a new request for the page. But I might set my browser not to check with you: then I will read the page again without you ever knowing about it. <p> The other sort of cache is on a larger scale. I'm in the UK. Because the link across the Atlantic is sometimes very congested, we've set up a national cache. (Many individual ISP's also do the same thing.) I can set my browser to get your pages from the national cache instead of directly from you. If anyone else in the country has used the cache to look at your pages recently, the cache will have saved them, and will give them out to me without ever telling you about it. So hundreds of people could read your pages, even though you'd only sent it out once. Also, if the page I wanted wasn't already stored in the cache, the cache would ask for it from you on my behalf. This would mean that the request appeared to come from the cache, rather than from me. If several people did this, you would think that only one host was accessing the cache, rather than lots of different ones. <hr> <b>3. What you can know.</b> The only things you can know for certain are the number of requests made to your server, when they were made, which files were asked for, and which host asked you for them. <p> You can also know what people told you their browsers were, and what the referring pages were. You should be aware, though, that many browsers lie deliberately about what sort of browser they are, or even let users configure the browser name. Also, a few browsers send incorrect referrers, telling you the last page that the user was on even if they weren't referred by that page. <hr> <b>4. What you can't know.</b> <ol type=i> <li><i>You can't tell the identity of your readers</i>. Unless you explicitly require users to provide a password, you don't know who connected or what their e-mail addresses are. <li><i>You can't tell how many visitors you've had</i>. You can guess by looking at the number of distinct hosts that have requested things from you. But this is not always a good estimate for three reasons. First, if users get your pages from a local cache server, you will never know about it. Secondly, sometimes many users appear to connect from the same host: either users from the same company or ISP, or users using the same cache server. Finally, sometimes one user appears to connect from many different hosts. AOL now allocates users a <a href="http://webmaster.info.aol.com/network.html">different hostname for <i>every request</i></a>. So <em>if your home page has 10 graphics on, and an AOL user visits it, most programs will count that as 11 different visitors!</em> <li><i>You can't tell how many visits you've had</i>. Many programs, under pressure from advertisers' organisations, define a "visit" (or "session") as a sequence of requests from the same host until there is a half-hour gap. This is an unsound method for several reasons. First, it assumes that each host corresponds to a separate person and vice versa. This is simply not true in the real world, as discussed in the last paragraph. Secondly, it assumes that there is never a half-hour gap in a genuine visit. This is also untrue. I quite often follow a link out of a site, then step back in my browser and continue with the first site from where I left off. Should it really matter whether I do this 29 or 31 minutes later? Finally, to make the computation tractable, such programs also need to assume that your logfile is in chronological order: it isn't always, and analog will produce the same results however you jumble the lines up. <li><i>Cookies don't solve these problems</i>. Some sites try to count their visitors by using cookies. But this can only work if you refuse to let people read your pages who can't or won't take a cookie. And you still have to assume that your visitors will use the same cookie for their next request. <li><i>You can't follow a person's path through your site</i>. Even if you assume that each person corresponds one-to-one to a host, you don't know their path through your site. It's very common for people to go back to pages they've downloaded before. You never know about these subsequent visits to that page, because their browser has cached them. So you can't track their path through your site accurately. <li><i>You often can't tell where they entered your site, or where they found out about you from</i>. If they are using a cache server, they will often be able to retrieve your home page from their cache, but not all of the subsequent pages they want to read. Then the first page you know about them requesting will be one in the middle of their true visit. <li><i>You can't tell how they left your site, or where they went next</i>. They never tell you about their connection to another site, so there's no way for you to know about it. <li><i>You can't tell how long people spent reading each page</i>. Once again, you can't tell which pages they are reading between successive requests for pages. They might be reading some pages they downloaded earlier. They might have followed a link out of your site, and they might or might not return later. They might have interrupted their reading for a quick game of Minesweeper. You just don't know. </ol> The bottom line is that HTTP is a stateless protocol. That means that people don't log in and retrieve several documents: they make a separate connection for each file they want. And <em>a lot of the time they don't even behave as if they were logged into one site</em>. That's why analog reports requests, i.e. what is going on at your server, which you know, rather than guessing what the users are doing. <p> I've presented a somewhat negative view here, emphasising what you can't find out. Web statistics are still informative: it's just important not to slip from "this page has received 30,000 requests" to "30,000 people have read this page." In some sense these problems are not really new to the web -- they are present just as much in print media too. For example, you only know how many magazines you've sold, not how many people have read them. In print media we have learnt to live with these issues, using the data which are available, and it would be better if we did on the web too, rather than making up spurious numbers. <hr> <p> <b>5. Acknowledgements and further reading.</b> Many other people have made these points too. While originally writing this section, I benefited from three earlier expositions: <cite>Interpreting WWW Statistics</cite> by Doug Linder; <cite>Making Sense of Web Usage Statistics</cite> by Dana Noonan; and <cite>Getting Real about Usage Statistics</cite> by Tim Stehle. Unfortunately none of these articles seems to be available on the web any more. <p> Another, extremely well-written document on these ideas is <cite>Measuring Web Site Usage: Log File Analysis</cite> by Susan Haigh and Janette Megarity. Being on a Canadian government site, it's available in both <a href="http://www.nlc-bnc.ca/pubs/netnotes/notes57.htm">English</a> and <a href="http://www.nlc-bnc.ca/pubs/netnotes/fnotes57.htm">French</a>. Or for an even more negative point of you, you could read <cite><a href="http://www.cranfield.ac.uk/stats/">Why Web Usage Statistics are (Worse Than) Meaningless</a></cite> by Jeff Goldberg. <hr> <hr> <a name="reports"><h2>Analog's reports</h2> </a> This section summarises all of analog's reports, and the main commands which control them. For details on these commands, see the sections on <cite><a href="#timereps">Time reports</a></cite>, <cite><a href="#othreps">Other reports</a></cite> and <cite><a href="#hierreps">Hierarchical reports</a></cite>. For exact details on what is counted in each report, see the section on <cite><a href="#defns">Analog's definitions</a></cite>. <h3><a name="reptop">Top lines</a></h3> <hr> Program started at Thu-24-Sep-1998 13:48. <br>Analysed requests from Wed-16-Sep-1998 09:52 to Mon-21-Sep-1998 02:04 (4.7 days). <hr> The top two lines of the report tell you when the program was run, and which dates it includes data from. <h3><a name="repgen">General Summary</a></h3> <hr> (Figures in parentheses refer to the 7 days to 24-Sep-1998 13:48). <br><b>Successful requests:</b> 79,646 (48,947) <br><b>Average successful requests per day:</b> 17,036 (6,992) <br><b>Successful requests for pages:</b> 31,138 (18,689) <br><b>Average successful requests for pages per day:</b> 6,660 (2,669) <br><b>Failed requests:</b> 9,008 (6,378) <br><b>Redirected requests:</b> 344 (235) <br><b>Distinct files requested:</b> 8,180 (2,884) <br><b>Distinct hosts served:</b> 6,640 (4,991) <br><b>Corrupt logfile lines:</b> 2 <br><b>Data transferred:</b> 976.92 Mbytes (627.06 Mbytes) <br><b>Average data transferred per day:</b> 208.96 Mbytes (89.58 Mbytes) <hr> The General Summary contains some overall statistics about the data being analysed: the most important being the number of <b>requests</b> (the total number of files downloaded, including graphics); the number of <b>requests for pages</b> (just counting the various pages on your site); the number of <b>distinct hosts</b> (the number of different computers requests have come from); and the amount of <b>data transferred</b> in bytes. For exactly what the various lines mean, see the section on <cite><a href="#defns">Analog's definitions</a></cite>. <p> The figures in parentheses represent the seven days given at the top of this report: it's the seven days before the <kbd>TO</kbd> time if there was a <kbd>TO</kbd> command, or if not the seven days before the report was run. <p> You can't find out the number of visitors or visits you've had, and don't believe any program which tells you that you can. See the section on <cite><a href="#webworks">How the web works</a></cite> for a discussion of this. <p> You can turn this report on or off with the <kbd><a href="#replist">GENERAL</a></kbd> command. You can include or exclude the figures for the last seven days with the <kbd><a href="#LASTSEVEN">LASTSEVEN</a></kbd> command. You may get slightly different lines to those above, depending on other options you have set. <h3><a name="reptime">Monthly, Weekly, Daily, Hourly, Quarter-Hour and Five-Minute Reports</a></h3> <hr> Each unit (<img src="barb1.gif" alt="+">) represents 800 requests for pages, or part thereof. <pre><tt>week beg.: #reqs: pages: ---------: -----: -----: 13/Sep/98: 69614: 25277: <img src="barb32.gif" alt="++++++++++++++++++++++++++++++++"> 20/Sep/98: 10032: 5861: <img src="barb8.gif" alt="++++++++"> </tt></pre> Busiest week: week beginning 13/Sep/98 (26,654 requests for pages). <hr> These reports tell you how many requests there were in each time period. They also tell you which was the busiest time period. <p>You can control whether each report is included or not with the appropriate <a href="#replist"><kbd>ON</kbd> or <kbd>OFF</kbd></a> command. You can control which columns are listed by the <kbd><a href="#timeCOLS">COLS</a></kbd> commands. You can control which measurement to use for the bar charts and the "busiest" line by the <kbd><a href="#GRAPH">GRAPH</a></kbd> commands. You can determine how many rows are displayed with the <kbd><a href="#ROWS">ROWS</a></kbd> commands. You can display the lines backwards or forwards in time by the <kbd><a href="#BACK">BACK</a></kbd> commands. You can change the graphic used for the bar charts with the <kbd><a href="#BARSTYLE">BARSTYLE</a></kbd> command. <h3><a name="reptimesum">Daily and Hourly Summaries</a></h3> <hr> Each unit (<img src="barb1.gif" alt="+">) represents 150 requests for pages, or part thereof. <pre><tt>day: #reqs: pages: ---: -----: -----: Sun: 2031: 1193: <img src="barb8.gif" alt="++++++++"> Mon: 8001: 4668: <img src="barb32.gif" alt="++++++++++++++++++++++++++++++++"> Tue: 0: 0: Wed: 13934: 5915: <img src="barb32.gif" alt="++++++++++++++++++++++++++++++++++++++++"><img src="barb8.gif" alt=""> [etc.] </tt></pre> <hr> These reports tell you the total number of requests in each day of the week, or each hour of the day, over the time period given at the very top of the report. (It's not the average, nor is it the figures for just the last week or last day). <p>You can control whether each report is included or not with the appropriate <a href="#replist"><kbd>ON</kbd> or <kbd>OFF</kbd></a> command. You can control which columns are listed by the <kbd><a href="#timeCOLS">COLS</a></kbd> commands. You can control which measurement to use for the bar charts by the <kbd><a href="#GRAPH">GRAPH</a></kbd> commands. You can change the graphic used for the bar charts with the <kbd><a href="#BARSTYLE">BARSTYLE</a></kbd> command. <h3><a name="repoth">Other reports</a></h3> <hr> Listing the first 5 files by the number of requests, sorted by the number of requests. <pre><tt>#reqs: %bytes: last date: file -----: ------: ---------------: ---- 4123: 2.29%: 21/Sep/98 01:57: /~sret1/analog/ 3064: 0.15%: 21/Sep/98 01:54: /~sret1/analog/analogo.gif 1737: 0.01%: 21/Sep/98 01:53: /~sret1/images/bar1.gif 1692: 0.01%: 21/Sep/98 01:53: /~sret1/images/bar16.gif 1685: 0.01%: 21/Sep/98 01:53: /~sret1/images/bar8.gif 67345: 97.54%: 21/Sep/98 02:04: [not listed: 8,175 files] </tt></pre> <hr> The rest of the reports are all quite similar. Here is a list of them. If you're unfamiliar with some of the terms, see the section on <cite><a href="#defns">Analog's definitions</a></cite>. <ul> <li>The Host Report lists all <b>computers</b> which downloaded files from you. <li>The Domain Report lists which <b>countries</b> those computers came from. (If you only get "unresolved numerical addresses", see the <a href="#underfaq">FAQ</a>.) <li>The Organisation Report <a href="#domfile">attempts</a> to list the <b>organisations</b> (companies, institutions, ISPs etc.) which the computer was registered under. <li>The Request Report (the example above) lists which <b>files</b> were downloaded. <li>The Directory Report lists which <b>directories</b> those files came from. <li>The File Type Report lists the <b>file types</b> (actually, extensions) of those files. <li>The File Size Report breaks them down by <b>size</b>. <li>The Processing Time Report shows the <b>time taken</b> to serve each file. <li>The Redirection Report lists the filenames which resulted in redirections: mainly directories without the final slash, and "<b>click-thru</b>"'s. <li>The Failure Report lists the filenames which caused errors. <li>The Referrer Report lists which pages <b>linked</b> to your files. <li>The Referring Site Report lists the servers those referrers were on. <li>The Search Query Report and the Search Word Report lists which <b>search terms</b> people used to find your site (provided you've used the appropriate <kbd><a href="#SEARCHENGINE">SEARCHENGINE</a></kbd> commands). <li>The Redirected Referrer Report lists the referrers which led to redirections. <li>The Failed Referrer Report is essentially a <b>broken link</b> report. <li>The Browser Report lists the detailed versions of <b>browsers</b> used, and the Browser Summary collects them by vendor. <li>The Operating System Report lists the <b>operating systems</b> of the visitors whose browser types you know. <li>The <b>Virtual Host</b> Report and the <b>User</b> Report are obvious. <li>The Failed User Report lists the users who caused errors. <li>The Status Code Report lists the number of each <b>HTTP status code</b> that you had. </ul> Whether you can get all of these reports depends on what information is recorded in your logfile. <p>As usual, you can control whether each report is included or not with the appropriate <a href="#replist"><kbd>ON</kbd> or <kbd>OFF</kbd></a> command. You can control which columns are listed by the <kbd><a href="#othCOLS">COLS</a></kbd> commands. You can change how the reports are sorted by the <kbd><a href="#SORTBY">SORTBY</a></kbd> commands. You can control how many items are listed by the <kbd><a href="#FLOOR">FLOOR</a></kbd> commands. You can include or exclude individual items with the <a href="#outputexcludes">output <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd></a> commands. You can change the names of items in the reports with the <a href="#OUTPUTALIAS"><kbd>OUTPUTALIAS</kbd></a> commands. <p>The "not listed" line at the bottom counts those items which didn't get enough traffic to get above the <kbd>FLOOR</kbd> for the report, and those which were explicitly <kbd>EXCLUDE</kbd>d. <p>Most of these reports have a hierarchical structure, like this example for the Domain Report: <hr> Listing the first 5 domains by the number of requests, sorted by the number of requests. <pre><tt>no.: #reqs: %bytes: domain ---: -----: ------: ------ 1: 13243: 16.23%: .com (Commercial) : 1262: 1.26%: aol.com 2: 11783: 25.64%: .jp (Japan) : 9592: 22.19%: ad.jp : 1043: 1.97%: co.jp 3: 10073: 11.62%: .net (Network) : 1926: 1.71%: uu.net 4: 9657: 13.31%: [unresolved numerical addresses] 5: 7388: 8.04%: .uk (United Kingdom) : 5792: 5.74%: ac.uk : 1510: 1.99%: co.uk : 18502: 25.16%: [not listed: 82 domains] </tt></pre> <hr> You can control which items are listed on the lower levels by the <a href="#hierreps"><kbd>SUB</kbd></a> family of commands. There are also separate <a href="#SUBSORTBY">sub-<kbd>SORTBY</kbd></a> and <a href="#SUBFLOOR">sub-<kbd>FLOOR</kbd></a> commands for the lower levels. (Called <a href="#ARGSSORTBY"><kbd>ARGSSORTBY</kbd></a> and <a href="#ARGSFLOOR"><kbd>ARGSFLOOR</kbd></a> for some reports, such as the Request Report.) Notice that the lower levels are always listed with their parents, so they break up the sort order. Also, they don't count towards the total number of items listed, so there are only 5 domains listed in the example above, as you can see in the first column. (The <a href="#othCOLS"><kbd>N</kbd> column</a> is particularly useful in hierarchical reports for this reason.) <p> Which files are linked to in the Request Report is controlled by the <a href="#LINKINCLUDE"><kbd>LINKINCLUDE</kbd> and <kbd>LINKEXCLUDE</kbd></a> commands, and which files are linked to in the various referrer reports is controlled by the <a href="#LINKINCLUDE "><kbd>REFLINKINCLUDE</kbd> and <kbd>REFLINKEXCLUDE</kbd></a> commands. The links in the Request Report are also affected by the <kbd><a href="#BASEURL">BASEURL</a></kbd> command. <h3><a name="repbot">Bottom lines</a></h3> <hr> <i>This analysis was produced by <a HREF="http://www.statslab.cam.ac.uk/~sret1/analog/">analog4.03/Unix</a>. <br><b>Running time:</b> 8 seconds.</i> <hr> At the end of the report you can see which version of analog produced the report, and how long the report took to run. <hr> <hr> <a name="defns"><h2>Analog's definitions</h2> </a> This section describes how analog defines its terms, and exactly what is counted in each category. It gets a bit technical at times -- if you're just trying to understand the reports, I recommend you read the section on <cite><a href="#reports">Analog's reports</a></cite> first. <p> We start with some basic definitions. The <i>host</i> is the computer which has asked you for a file. The file might be a <i>page</i> (i.e., an HTML document) or it might be something else, such as an image. By default filenames ending in <kbd>.html</kbd>, <kbd>.htm</kbd> or <kbd>/</kbd> count as pages but you can tell analog to count any file as a page with the <kbd><a href="#PAGEINCLUDE">PAGEINCLUDE</a></kbd> command. <p> The <i>total requests</i> counts all the files which have been requested, including pages, graphics, etc. (Some people call this the number of hits, but that word is also used in other ways by other people, so I avoid it). The <i>requests for pages</i> obviously only counts pages. The <i>referrer</i> for a request is the place that the user (or his computer) heard about your file from. If he followed a link to reach a page, it will be the previous page. In the case of a graphic on a page, the referrer will be the page containing the graphic. <hr> Analog recognises four categories of request, based on the HTTP status code of the request. You can see the total number of requests for each status code, and what the codes mean, in the Status Code Report. (Or see the <a href="http://www.w3.org/Protocols/rfc2068/rfc2068">HTTP spec</a> for a detailed description.) <p> First, <i>successful requests</i> are those with HTTP status codes in the 200's (where the document was returned) or with code 304 (where the document was requested but was not needed because it had not been recently modified and the user could use a cached copy). Sometimes the logfile line doesn't contain a status code. These lines are also assumed by analog to be successes. <p> <i>Redirected requests</i> are those with other codes in the 300's, indicating that the user was directed to a different file instead. The most common cause of these requests is that the user has incorrectly requested a directory name without the trailing slash. The server replies with a redirection ("you probably mean the following") and the user then makes a second connection to get the correct document (although usually the browser does it automatically without the user's intervention or knowledge). The other common cause of redirected requests is their use as "click-thru" advertising banners. <p> <i>Failed requests</i> are those with codes in the 400's (error in request) or 500's (server error). They come about for a variety of reasons, but the most common are when the requested file is not found or is read-protected. <p> Finally, <i>requests returning informational status code</i> are those with status codes in the 100's. These are very rare at the moment. <p> There are a few other types of logfile lines listed in the General Summary. <i>Lines without status code</i> refers to those logfile lines without a status code, and the successful requests in the General Summary only counts the ones with a status code: except if the line contains the name of the file requested, and the filename is being counted (not starred in the <kbd><a href="#starredfmt">LOGFORMAT</a></kbd>), then it's listed in the successes. <i>Corrupt logfile lines</i> are those which analog didn't manage to parse. And <i>unwanted logfile entries</i> are ones which we have specifically <a href="#include">excluded</a>. Successful requests for pages refers to those lines on which the file requested was named and was a page. <hr> Most reports only include successful requests in calculating the number of requests, requests for pages, bytes, and last date: unless, of course, the report is a redirection or failure report. There is a further restriction on the time reports, the Status Code Report, the Processing Time Report and the File Size Report: the logfile line must also contain the name of the file requested, and the filename must be being counted. This is necessary to stop double counting if the server uses separate logs. <p> The "not listed" line at the bottom of each of the <a href="#othreps">non-time reports</a> includes both those items which were explicitly excluded at the output stage with an <kbd><a href="#outputexcludes">OUTPUTEXCLUDE</a></kbd> command, and those which were not listed because they were below the floor for the report. <p> The figures in parentheses in the General Summary are for the last seven days: either the seven days before the <kbd>TO</kbd> time, or if no <kbd>TO</kbd> time is given, the seven days before the time of the program start. (It would be nicer to use the seven days before the last time in the logfile, but we don't know when this is until we've read the whole logfile, and by then it's too late.) The figures for the last seven days are not included if all, or none, of the requests fall in the last seven days. <p> In the Domain Report, "domain not given" means that the hostname did not contain a dot. "Unknown domain" means that it did contain a dot, but that the domain name was not in the <a href="#domfile">domains file</a>. The hosts and domains concerned can be listed by turning <a href="#debugs">debugging</a> on. <hr> <hr> <a name="errors"><h2>Errors and warnings</h2> </a> This section lists all the errors and warnings which analog can produce, together with a short explanation. <p> First, you should understand the difference between a crash, an error, a warning, and a debugging message. First, a <i>crash</i> is when analog exits prematurely, without producing the whole output file. The system might give a message, but analog will not give one of its own messages. Analog should never crash. If it does crash, please <a href="#mailing">tell me about it</a>. <p> An <i>error</i> is something which stops analog finishing its job. Whenever an error is detected, analog gives a message starting something like <kbd>analog: Fatal error:</kbd> and will then tell you what type of thing went wrong before quitting. <p> A <i>warning</i> is a problem which is not fatal to analog: it will keep on with its processing. These vary from the possibly serious, such as files which could not be found, to purely informational. They produce a message starting <kbd>analog: Warning</kbd>. You can turn warnings off using the <kbd><a href="#WARNINGS">WARNINGS</a></kbd> command. <p> Finally, a <i>debugging message</i> gives information on the state of the program. They just begin with a single code letter followed by a colon. You don't get any debugging messages unless you've <a href="#debugs">asked for them</a>. <p> If you want to send these messages to a file instead of to the screen, you can use the <kbd><a href="#ERRFILE">ERRFILE</a></kbd> command. To tell analog the width of your screen for these messages, you can use the <kbd><a href="#ERRLINELENGTH">ERRLINELENGTH</a></kbd> command. <p> Now I shall describe all the possible <a href="#errs">errors</a> and <a href="#warns">warnings</a> in detail. <hr> <h3><a name="errs">Errors</a></h3> <dl> <dt><b>Ran out of memory: cannot continue</b> <dd>Analog ran out of memory. Try increasing the memory available to the process, if your operating system will allow it, or using the <kbd><a href="#lowmem">LOWMEM</a></kbd> commands. <dt><b>Cannot ignore mandatory configuration file</b> <dd>See the section in the Readme on the <a href="#specialcfgs">mandatory configuration file</a>. <dt><b>Can't find language file <br>Language file too short <br>Language file contains excessively long lines</b> <dd>Analog can't run without a well-formed language file. See the documentation on <a href="#LANGUAGE">language files</a>. <dt><b>Attempted to read more than 50 configuration files</b> <dd>The most likely explanation for this is that you have accidentally created a loop using the <kbd><a href="#CONFIGFILE">CONFIGFILE</a></kbd> command, for example if a configuration file includes itself. <dt><b>Incorrect default given in <kbd>anlghead.h</kbd> <br>Default given in <kbd>anlghead.h</kbd> too short</b> <dd>If you've compiled your own version, and you've specified an incorrect configuration in the file <kbd>anlghead.h</kbd>, analog gives up to allow you to fix it. <dt><b>Failed to open output file for writing</b> <dd>Analog couldn't create, or couldn't write to, the output file you specified. <dt><b>Cache output file already exists: won't overwrite</b> <dd>Analog won't overwrite an old cache file. You must move or delete it yourself first. <dt><b><kbd>OUTFILE</kbd> and <kbd>CACHEOUTFILE</kbd> are the same</b> <dt><b><kbd>OUTFILE</kbd> and <kbd>CACHEOUTFILE</kbd> both set to stdout</b> <dd>This can't be what you wanted, because one would overwrite the other. <dt><b><kbd>OUTPUT NONE</kbd> and <kbd>CACHEOUTFILE none</kbd> selected</b> <dd>You requested no output. </dl> <hr> <h3><a name="warns">Warnings</a></h3> Remember that warnings are not fatal: in fact some are rarely even serious. You can turn them off using the <kbd><a href="#WARNINGS">WARNINGS</a></kbd> command. The possible warnings come in several different categories, shown by a letter in the warning message. The categories are as follows. <dl compact> <dt><kbd>C</kbd><dd>invalid configuration specified <dt><kbd>D</kbd><dd>dubious configuration specified <dt><kbd>E</kbd><dd><kbd>ERRFILE</kbd> command used <dt><kbd>F</kbd><dd>files missing or corrupt <dt><kbd>L</kbd><dd>apparent problems in logfiles <dt><kbd>M</kbd><dd>possibly problems in logfiles <dt><kbd>R</kbd><dd>turning off empty reports </dl> <p> <h4><a name="warnsC">Category C</a></h4> This category indicates an incorrect configuration. Analog will either ignore what you said, or try and do the best it can with it. There are too many warnings in this category to list completely. You will have to consult the documentation for the particular <a href="#custom">configuration command</a> that gave an error. If you get an error for a command which used to work in a previous version of analog, have a look in the section <cite><a href="#update">Updating from older versions</a></cite>. <p> <h4><a name="warnsD">Category D</a></h4> This is for configurations which might be intended, but which look suspicious. Analog will not override what you've specified in this case. <dl> <dt><b><kbd>LOGFORMAT</kbd> with no subsequent logfile</b> <dd>You have specified a <kbd>LOGFORMAT</kbd> command, but no subsequent logfile to which it could be applied. Most likely you put the <kbd>LOGFORMAT</kbd> after the <kbd>LOGFILE</kbd> command. You must put the <kbd>LOGFORMAT</kbd> before the <kbd>LOGFILE</kbd> command or use <kbd>DEFAULTLOGFORMAT</kbd> instead. See the section on <cite><a href="#logfmt">Specifying a log format</a></cite> for more details. <dt><b>Offset not a multiple of 30 <br>Offset more than 25 hours</b> <dd>The <a href="#TIMEOFFSET">time offsets</a> are meant to be for correcting between differences in time zones. These differences are usually multiples of 30 minutes between -25 and +25 hours. Maybe you specified the offset in hours instead of minutes by mistake, or something like that. <dt><b><kbd>FROM</kbd> time is later than the present</b> <dd>Usually this will mean that no entries are counted. Analog doesn't try and correct it in case the clock on your computer or your server is wrong -- but you would be better using <kbd><a href="#TIMEOFFSET">TIMEOFFSET</a></kbd> or <kbd><a href="#TIMEOFFSET">LOGTIMEOFFSET</a></kbd> to correct those clocks. <dt><b><kbd>SORTBY</kbd> doesn't match <kbd>FLOOR</kbd> <br><kbd>SORTBY</kbd> doesn't match <kbd>SUBSORTBY</kbd> (or <kbd>FLOOR</kbd>/<kbd>SUBFLOOR</kbd>) <br><kbd>SORTBY</kbd> (or <kbd>FLOOR</kbd> or <kbd>GRAPH</kbd>) isn't included in <kbd>COLS</kbd></b> <dd>Within one report, it's helpful to your readers to have the sort methods and the floors compatible, and all included in the <kbd>COLS</kbd>. (See the section on <cite><a href="#othreps">Non-time reports</a></cite>). <dt><b>Column <kbd>N</kbd> with <kbd>SORTBY ALPHABETICAL/RANDOM</kbd></b> <dd>Numbering off the items when they're not in order of busyness is probably a mistake. <dt><b>Time reports have not all got same value of <kbd>BACK</kbd></b> <dd>It's usually helpful to have all the <a href="#BACK">time reports</a> running in the same direction. <dt><b>Report contains no <kbd>COLS</kbd></b> <dd>You've got an empty <kbd>COLS</kbd> list for one report, so you'll just get a list of names, not any information about them. <dt><b><kbd>LOWMEM 3</kbd> prevents that item being cached</b> <dd>You're making a <a href="#cache">cache file</a>, but one item is not being recorded because of a <kbd><a href="#lowmem">LOWMEM</a></kbd> command, and will therefore not be saved in the cache file. </dl> <p> <h4><a name="warnsE">Category E</a></h4> There is only one warning in this category. <dl> <dt><b>Redirecting future diagnostic messages</b> <dd>You've used an <kbd>ERRFILE</kbd> command to change the destination of errors, warnings, debugging and <kbd>PROGRESSFREQ</kbd> diagnostics. This is just warning you so that you don't miss any messages. </dl> <p> <h4><a name="warnsF">Category F</a></h4> This category is for diagnosing files which couldn't be opened or read successfully. These can be serious, but most of the messages should be self-explanatory. There are two worth mentioning specifically. <dl> <dt><b>Can't auto-detect format of logfile</b> <dd>The <kbd><a href="#logfmt">LOGFORMAT</a></kbd> is set to automatic detection, but the first line of the logfile is not in any of the standard formats. This error can often be generated when you try and specify your own <kbd>LOGFORMAT</kbd> but put it after the <kbd>LOGFILE</kbd> command so that it is not in effect for that logfile. <dt><b>Logfile with ambiguous dates</b> <dd>Some servers, notably IIS and WebSite, record dates in their logfiles according to local conventions. Then if analog encounters 2/1/99, for example, it doesn't know whether it's the 2nd January or 1st February. This problem, and what to do about it, is described in more detail in the section on <cite><a href="#IISfmt">Choosing a logfile</a></cite>. <dt><b>DNS lock file already exists</b> <dd>To stop two copies of analog trying to write the DNS file at the same time, an empty "lock file" is created, which tells the second copy of analog to use <kbd>DNS LOOKUP</kbd> instead of <kbd>DNS WRITE</kbd>. If analog crashes, it may not delete its lock file. So if you get the "already exists" message even though no other copy of analog is running, you may need to delete the lock file yourself. </dl> <p> <h4><a name="warnsL">Category L</a></h4> When analog finishes reading a logfile, it checks whether there might have been something wrong with it. <dl> <dt><b>Large number of corrupt lines</b> <dd>This could indicate a problem with the logfile, or with the <kbd><a href="#logfmt">LOGFORMAT</a></kbd> specification. The possible causes are described in the section about <cite><a href="#corruptlines">Choosing a logfile</a></cite>. <dt><b>Logfiles overlap: possible double counting</b> <dd>Two logfiles which were counting the same type of item overlapped in time. Maybe you read two copies of the same logfile. Or maybe the <kbd><a href="#starredfmt">LOGFORMAT</a></kbd> specification should have told analog to ignore some of the items. Or it could be that the logfiles are in fact disjoint and there wasn't really a problem: analog only checks the dates of the logfiles, not the details of them. In this last case, the statistics produced will still be correct. </dl> <p> <h4><a name="warnsM">Category M</a></h4> This category is for warnings about logfile formats which might make analog produce unexpected results. <dl> <dt><b>Logfile contains lines with no [whatevers], which are being filtered</b> <dd>This is usually harmless. It is perhaps best explained by example. Suppose you are <a href="#include">excluding</a> certain files from the analysis, but that you are also analysing a browser log which just contains information about the browsers used, not which files they read. Then we can't exclude the browsers which read the excluded files, because we don't know which they were, so all browsers will be included. <dt><b>Logfile contains lines with no file names (or bytes): page (or byte) counts may be low</b> <dd>If a logfile line doesn't contain a file name, analog will assume that the request wasn't for a page. Similarly, if it doesn't give the number of bytes transferred, analog will make the bytes zero. So the number of page requests or bytes credited to the other items on that line will then be too low. </dl> <p> <h4><a name="warnsR">Category R</a></h4> This is used when analog turns off an empty report. This could be because none of the relevant items were included in any of the logfiles, or perhaps beacause a <kbd><a href="#lowmem">LOWMEM</a></kbd> command stopped them being recorded. <p> <h4><a name="brokenpipe">Broken Pipe</a></h4> This is not an analog-generated warning, but it can result from analog closing a logfile it's uncompressing without reading the whole of it, when it determines that it will not need it. <hr> <hr> <a name="faq"><h2>Frequently asked questions</h2> </a> This list is divided into six sections: <ol type="A"> <li><a href="#startfaq">Getting Started</a> <li><a href="#configfaq">Basic Configuration</a> <li><a href="#underfaq">Understanding the Output</a> <li><a href="#advfaq">Advanced Usage</a> <li><a href="#formfaq">Form Interface</a> <li><a href="#designfaq">Design Decisions</a> </ol> <h3><a name="startfaq">A. Getting Started</a></h3> Most questions in this category are answered in the section entitled <cite><a href="#start">Starting to use analog</a></cite>. If you can't get analog running you should look there. <ol> <li><b>Analog doesn't have a <kbd>setup.exe</kbd>.</b> <br>No, and it doesn't need one. It's already ready to run! See <cite><a href="#startpc">Starting to use analog under Windows</a></cite>. <li><b>Analog just flashes up a DOS window and then quits.</b> <br>This is the correct behaviour. It should have created a report called <kbd>Report.html</kbd>. See <cite><a href="#startpc">Starting to use analog under Windows</a></cite>. <li><b>When I try and compile analog, it gives me an error (e.g. on SunOS 5).</b> <br>Maybe you need to edit the Makefile. There are some platform-specific notes in the section <cite><a href="#startux">Starting to use analog on other platforms</a></cite>, and in the Makefile itself. <li><b>Analog didn't write the logfile when I ran it.</b> <br>Analog doesn't write the logfiles. Your web server writes the logfiles, and analog just reads them. See <cite><a href="#start">Starting to use analog</a></cite>. <li><b>Analog is looking for files like <kbd>/usr/local/etc/httpd/analog/analog.cfg</kbd> which don't exist.</b> <br>You have to set the location of these files in <kbd>anlghead.h</kbd> before compiling. <li><b>Analog won't read extended logfiles generated by IIS.</b> <br>This server writes the date only at the top of the logfile, not on every line. But it doesn't write a new date if the date changes during the logfile, so analog can't tell which date later entries in the log occurred on. More details, and what to do about it, are in the section on <cite><a href="#dateonly">Choosing a logfile</a></cite>. <li><b>What does "Logfile with ambiguous dates" mean?</b> <br>See the section on <cite><a href="#warnsF">Errors and warnings</a></cite>. <li><b>What does this error message mean?</b> <br>Again, see the section on <cite><a href="#errors">Errors and warnings</a></cite>. <li><b>I tried to run analog from my browser, but it didn't work.</b> <br>Analog should not be run as a CGI program, or even put in the folder with your CGI programs, for security reasons. You should use the special <a href="#form">CGI program</a> instead. <li><b>Is analog Year 2000 compatible?</b> <br>Yes (and so are all previous versions). It interprets two-year dates in input as lying between 1970 & 2069 inclusive. </ol> <h3><a name="configfaq">B. Basic Configuration</a></h3> Analog has lots of configuration commands, all of which are in the section on <cite><a href="#custom">Customising analog</a></cite>. Here are some of the most common questions. If your question isn't answered here, you could also try looking in the <a href="#indx">index</a>. <ol> <li><b>I want to make several different statistics pages. Do I have to install several copies of analog?</b> <br>No. Just install it once, and run it with different <a href="#CONFIGFILE">configuration files</a>. <li><b>My <kbd>analog.cfg</kbd> included lots of <kbd>CONFIGFILE</kbd> commands, but only one report was produced.</b> <br>Analog can only produce one report per run. To produce several reports, you have to run it several times. <li><b>Why doesn't the Daily Report only show the last six weeks?</b> <br>This is controlled by the <kbd><a href="#ROWS">FULLDAYROWS</a></kbd> command. <li><b>Why do the time reports all list 0 requests?</b> <br>They probably only list 0 requests for pages. Maybe you need to use <kbd><a href="#PAGEINCLUDE">PAGEINCLUDE</a></kbd> to count more files as pages. <li><b>How do I get the Request Report to list files with fewer than 20 requests?</b> <br>Use the <kbd><a href="#FLOOR">REQFLOOR</a></kbd> command. <li><b>How do I ignore accesses from my site?</b> <br>Use the <kbd><a href="#include">HOSTEXCLUDE</a></kbd> command. <li><b>How do I ignore internal referrers in the Referrer Report?</b> <br>Use the <kbd><a href="#include">REFREPEXCLUDE</a></kbd> command. <li><b>How do I get information on just my pages, not everybody's?</b> <br>Use the <kbd><a href="#include">FILEINCLUDE</a></kbd> command. <li><b>I used the command "<kbd>DIREXCLUDE /mydir/</kbd>", but files in that directory were still listed.</b> <br><kbd>DIREXCLUDE</kbd> only affects the Directory Report, not the other reports. You want "<kbd>FILEEXCLUDE /mydir/*</kbd>" instead. <li><b>I used the command "<kbd>FILEEXCLUDE /cgi-bin/script.pl</kbd>", but that file was still listed in the Request Report.</b> <br>If the file has search arguments, you have to be a bit careful with <kbd>FILEEXCLUDE</kbd>. This is described in the section about <a href="#unintuitive">search arguments</a>. <li><b>Does the order of the commands matter in the configuration file?</b> Only occasionally. If you have two of one command, the later one will generally override the earlier one. Apart from that, commands can come in any order, except that <kbd><a href="#logfmt">LOGFORMAT</a></kbd> and <kbd><a href="#TIMEOFFSET">LOGTIMEOFFSET</a></kbd> commands must come before the <kbd>LOGFILE</kbd> to which they refer. <li><b>Why are my browser and referrer reports empty?</b> <br>Maybe your logfile doesn't contain any browser and referrer information? <li><b>Why isn't the Referrer Report sorted properly?</b> <br>It is sorted properly. But <a href="#args">search arguments</a> are also listed under the file they belong to, and this interrupts the ordering. If you set the <kbd><a href="#ARGSFLOOR">REFARGSFLOOR</a></kbd> high enough you won't see the search arguments. Or you can include the <a href="#othCOLS"><kbd>N</kbd> column</a> to make the ordering more obvious. <li><b>Why can't I have <kbd>P</kbd> in the <kbd>REQCOLS</kbd> or <kbd>REQSORTBY</kbd>?</b> <br>The number of page requests doesn't make sense in the Request Report because it's either the same as the number of requests (if the file is a page) or zero (if it isn't). If you want to list only pages in this report, use <kbd>REQINCLUDE pages</kbd> instead. <li><b>I want to list (<i>or</i> not to list) referrers with their search arguments in the Referrer Report.</b> <br>To see the search arguments you may need to set the <kbd><a href="#ARGSFLOOR">REFARGSFLOOR</a></kbd> lower. To avoid seeing them, you could set the <kbd>REFARGSFLOOR</kbd> higher, or alternatively use the <kbd><a href="#ARGSINCLUDE">REFARGSEXCLUDE</a></kbd> command to ignore them either for all files or just for particular files. <li><b>Can I find out which files each referrer pointed to?</b> <br><i>or</i> <b>Can I find out which files each host has read?</b> <br><i>or</i> <b>Can I find out which hosts have read each file?</b> <br><i>or</i> <b>Can I find out the number of hosts visiting on each day?</b> <br><i>or <b>lots of similar questions.</b></i> <br>There are lots of questions like this. They all want analog to cross-reference two sorts of item (e.g. files and referrers in the first example above, or hosts and dates in the last). Granted, these would be useful. But it is fundamental to analog's speed and minimal memory requirement that it only records statistics for each type of item individually, and doesn't record enough information to cross-reference them afterwards. <br>What you can do is to restrict the analysis to just requests from certain referrers (for example) with the <kbd><a href="#include">REFINCLUDE</a></kbd> command, or to a particular time period with <a href="#FROMTO"><kbd>FROM</kbd> and <kbd>TO</kbd></a>. This is often good enough. <li><b>Can I use <kbd>%d</kbd>, <kbd>%m</kbd> etc. in the <kbd>LOGFILE</kbd>, like I can in the <kbd><a href="#OUTFILE">OUTFILE</a></kbd></b>? <br>No. This is rarely useful, because you can only get at one logfile that way. If you're on Unix, you can embed the date in the logfile name using the <kbd>date</kbd> command: for example, <pre>analog access.`date +%Y%m%d`.log</pre> <li><b>I get the message "logfiles overlap" even though the two logfiles contain completely separate requests.</b> <br>This message is based only on the dates of the files, not the contents. If you're sure there is no problem, you can turn it off with the command <kbd><a href="#debug">WARNINGS -L</a></kbd>. <li><b>Can I get data on individual visitors, or visits, to my site?</b> <br>No, it's not technically possible, and don't believe any program which tells you it is. See the section on <cite><a href="#webworks">How the web works</a></cite> for details. <li><b>Can I change the background colour of my output?</b> <br>Yes. The correct way to do this is to write a style sheet, and then use the <kbd><a href="#STYLESHEET">STYLESHEET</a></kbd> command. <li><b>Can I change the way dates are formatted in the output?</b> <br><i>or</i> <b>Can I change some of the phrases in the output?</b> <br>Yes, by editing the <a href="#LANGUAGE">language file</a>. </ol> <h3><a name="underfaq">C. Understanding the Output</a></h3> Most of the questions in this category are answered in the section on <cite><a href="#meaning">What the results mean</a></cite>, which I really recommend you read if you want to understand what analog is telling you. <ol> <li><b>How do I find out the number of hits from your data?</b> <br>I don't use the word <i>hits</i>, because people use it in different ways, so it's misleading. I use <i>requests</i> for the number of transfers of any type of file (text, graphics, ...), and <i>page requests</i> for the number of transfers of HTML pages. See the section on <cite><a href="#defns">Analog's definitions</a></cite> for more information. <li><b>Why are there so many referrers from my own site?</b> <br>These come from all the internal links on your site, and all the graphics on your pages. See the section on <cite><a href="#webworks">How the web works</a></cite> for more information. If you don't want to see them, you can use <kbd><a href="#outputexcludes">REFREPEXCLUDE</a></kbd> to exclude them. <li><b>Why doesn't analog agree with the counter on my page?</b> <br>There are lots of possible reasons. Do they both start from the same date? Are you just looking at requests for that one page with analog, not for all your other pages and graphics? Also, analog will record all requests to that page; if it's a graphic, your counter will only measure requests from people on graphical browsers that reached that place on the page. <li><b>Why do I only get "unresolved numerical addresses" in the domain report?</b> <br>Your server only records the numerical IP address of the hosts that contact you, not their names. Read the section about <cite><a href="#dns">DNS lookups</a></cite>, or turn DNS resolution on in your server. <li><b>Why are my click-thru's (<i>or</i> CGI scripts) not listed in the Request Report?</b> <br>If they cause a redirection to another page, they will be listed in the Redirection Report, rather than the Request Report. <li><b>Why are directories listed in the Request Report?</b> <br>They are not directories, they are pages with the same name as the directory. For example, I have both a directory called <kbd>/analog/</kbd> and a page called <kbd>/analog/</kbd> (which happens to be the same as <kbd>/analog/index.html</kbd>). <li><b>When someone reads one of my pdf files, it scores dozens of hits.</b> <br>PDF files are often downloaded and read one page at a time, and each page will then count as a separate request. Although this is not ideal, it's much less clear what to do about it. Analog has no way of knowing how many pages constituted a single download in the reader's mind. As usual, we can only reliably report how many requests there were at the server, not guess what users did with the file later. </ol> <h3><a name="advfaq">D. Advanced Usage</a></h3> <ol> <li><b>How can I do such-and-such with a command line option?</b> <br>Use the <kbd><a href="#plusC">+C</a></kbd> option to put any configuration command on the command line. <li><b>I want a list of all command line arguments.</b> <br>There is a list in the <a href="#clargs">index</a>. <li><b>Can analog read FTP logfiles?</b> <br>Yes. If you are using the xferlog format, then there is a configuration file to help you in the <kbd>examples</kbd> directory. Otherwise you will have to write your own <kbd><a href="#logfmt">LOGFORMAT</a></kbd>. (You probably won't be able to read anything other than the lines corresponding to file transfers.) <li><b>How can I run analog automatically every day?</b> <br>This depends on your particular machine. On Unix, you need to run analog as a cron job (see "man cron"). This is my cron command to run it at 1:50am every day: <br><kbd>50 1 * * * $HOME/bin/analog</kbd> <br>On Windows NT you can do the same with the at command, but only an administrator can run at. On Windows 98, it should be possible with the Task Scheduler, although I haven't tried it. On Windows 95 it's not possible as far as I know. <br>On Mac, there are programs called <a href="http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/cfg/chris-cron-10a7.hqx">Cron</a> or <a href="http://hyperarchive.lcs.mit.edu/HyperArchive/Archive/app/time/">CronoTask</a> to do this. <li><b>I'm setting up IIS. Which logfile format should I use?</b> <br>The W3C format is probably best. You can turn fields on and off in this format. And it contains all the possible fields which can be logged, which the other formats do not. However, it is important to turn the date field on (it's off by default), not just to log the date once at the top: see the section on <a href="#dateonly">problems with logfile formats</a> for why. <li><b>I host lots of virtual domains. How should I set up analog?</b> <br>There's a file in the <kbd>examples</kbd> directory which discusses this issue. <li><b>Can I make multiple reports with one pass through the logfile?</b> <br>Not at the moment. I want to do this in a future version, but it will require some considerable work. <li><b>I ran out of memory when trying to run analog. What can I do?</b> <br>See the section on <a href="#lowmem">Coping with low memory</a>. <li><b>You're processing 20,000,000 requests in under 10 minutes. Why is mine much slower?</b> <br><i>or</i> <b>Analog appears to stall.</b> <br>If you have <a href="#dns">DNS lookups</a> on, they are very slow. Otherwise, it probably depends on the speed of your computer and disks, and what other programs are running at the same time. You can use the <kbd><a href="#PROGRESSFREQ">PROGRESSFREQ</a></kbd> command to see if it's really stalled or whether it's just being slow. If you are running out of memory, you might find analog's <kbd><a href="#lowmem">LOWMEM</a></kbd> commands helpful. <li><b>How do I make a link on my page that runs analog?</b> <br>Link to the <a href="#form">anlgform</a> program, with the desired options. But be careful about the load on your server. <li><b>Do I have to save all my old logfiles?</b> <br><i>or</i> <b>Can analog make statistics from an old report instead of reading the whole logfile again?</b> <br>These questions are answered in the section about <cite><a href="#cache">Cache files</a></cite>. <li><b>Can analog write to a database or spreadsheet?</b> <br>Use the <a href="#compout">computer-readable output style</a>, which can export to CSV. Or if what you really want to do is to run analog again without re-reading the logfiles, read the section about <cite><a href="#cache">Cache files</a></cite>. </ol> <h3><a name="formfaq">E. Form Interface</a></h3> There is also a section on <a href="#trouble">troubleshooting</a> in the documentation about the form interface. <ol> <li><b>I couldn't make the form run.</b> <br>Have you made analog work without the form? Have you run <kbd>anlgform.pl</kbd> from the command line as explained in the section on <a href="#trouble">troubleshooting</a>? <li><b>How can I specify different logfiles from the form interface?</b> <br>Just add a new field to the form with <kbd>name=LOGFILE</kbd> <li><b>I specified <kbd>LOGFILE=/var/log/apache/*</kbd></b> from the form but it didn't work. <br>On the form, you can't use wildcards in the <kbd>LOGFILE</kbd> name for <a href="#security">security reasons</a>. <li><b>My browser showed me anlgform.pl, rather than running it.</b> <br>You have to tell the server to execute the CGI program, not just send it out like it would for a normal file. Often this is done by putting it in a special <kbd>/cgi-bin/</kbd> directory. <li><b>Why does the form interface give "Document Returned no Data"?</b> <br>If it doesn't happen for a while, then probably the server is giving up before the analog process has finished running. Increase the timeout interval on the server. <li><b>The images don't appear when running analog from the form interface.</b> <br>You probably need to set the <kbd><a href="#IMAGEDIR">IMAGEDIR</a></kbd>. If the images are in your <kbd>/cgi-bin/</kbd> directory, the server will normally try to execute them instead of just sending them out. <li><b>Why do I get some reports that weren't requested on the form?</b> <br>If a report is neither included nor excluded on the form, the system default will be used. This will depend on your configuration files and on compile-time settings. </ol> <h3><a name="designfaq">F. Design Decisions</a></h3> or "Why didn't you do it this way?" <ol> <li><b>Why doesn't the <kbd>HEADERFILE</kbd> replace the whole <kbd><head></kbd> of the output file?</b> <br>Because you almost never get valid HTML that way. Use a <a href="#STYLESHEET">style sheet</a> instead. <li><b>Why not use HTML tables?</b> <br>Most non-graphical browsers don't do a good job with tables. Also tables aren't available in HTML 2.0, which is the sort of HTML analog writes. <li><b>Why are you still using HTML 2.0?</b> <br>Unfortunately my bar charts aren't valid in HTML 4.0. <li><b>It would be better if you used png's instead of gif's.</b> <br>I'm aware of the issues. But png support isn't good enough even in new browsers; and I have always made a point of designing analog to work even on old browsers. <li><b>Why not just do DNS resolution of the hosts that actually make it into the Host Report?</b> <br>There is one theoretical and one practical problem. Theoretically, the problem is that which hosts do make it into the Host Report can change when the DNS lookups have been done. And practically, this wouldn't help identify the busiest countries or organisations, which is usually what you really want to know. However, there is a Perl script on the <a href="#helpers">helper applications page</a> to do this. <li><b>Couldn't you do the DNS lookups faster with threads?</b> <br>The problem is, the standard commands for DNS lookups are not thread-safe on many platforms, so it would involve a lot of platform-specific code. Again, there are programs for specific platforms on the <a href="#helpers">helper applications page</a>. <li><b>Why doesn't analog analyse the error_log?</b> <br>This is answered in detail on the <cite><a href="#abolished290b1">What's new?</a></cite> page. But in summary, it's too difficult because each server has a different format for its error log. The various failure reports are good enough for most purposes. <li><b>My server lists local names in the logfile. Can you put a common suffix on them automatically?</b> <br>This wouldn't be a good idea by default, because things like "unknown" would get the suffix. You can always add them using <kbd><a href="#useraliases">HOSTALIAS</a></kbd>. On operating systems with regular expressions, there is an example to accomplish this in the <a href="#aliasregexp">section about aliases</a>. <li><b>Can you extrapolate from the current month's partial data to produce a prediction for the whole month, based on the rate so far?</b> <br>No. There are too many problems in trying to produce anything sensible, especially near the beginning of the month. Different days of the week and different times of day cause lots of problems. I would prefer to produce raw accurate data than suspect derived data. <li><b>Can you extend the Domain Report to say which US states people visited from?</b> <br>No. Some programs pretend to do this, but you can actually only tell which state the computer the person was using is in, which may be quite different from where the user was for ISP's or other large organisations. <li><b>Why not use language codes instead of country codes for the names of the language files?</b> <br>People are more familiar with the country codes. And not all of my languages have language codes anyway. <li><b>Why don't you sell analog?</b> <br>I didn't write analog for the money, and I'm happy just to see people use it. Also, by making it open source, lots of people send me ideas and code to include in future versions. How do you think I got all those languages? (Of course, if you want to send me money, or gifts in kind, or even just postcards...). </ol> <hr> If there's still something you can't figure out, see the <a href="#mailing">next section</a> for how to get help with analog. <hr> <hr> <a name="mailing"><h2>Mailing lists</h2> </a> I welcome mail about analog, both praise and bug reports! I and others are also usually happy to help people who have trouble with analog: it helps me to find bugs, and know where the documentation is unclear. <p> There are three mailing lists for analog. <dl> <dt><kbd>analog-announce</kbd> <dd>Announcements about analog. I post to this when there are new versions, for example. Usually only gets a few messages a year. <dt><kbd>analog-help</kbd> <dd>Getting help with analog from experienced users. This is the place to go if you have trouble setting up or configuring the program. Usually you will get a swift reply. <em>You have to subscribe to the list before you can send a message</em>. There is also a <a href="http://www.mail-archive.com/analog-help@lists.isite.net/">searchable web archive</a> of the list. <dt><kbd>analog-author</kbd> <dd>This just goes to me. Use for private comments, or other things that would not be suitable for the <kbd>analog-help</kbd> list. You may or may not get a swift reply, depending how busy I am with other things. </dl> <p> To subscribe to the analog-announce mailing list, send a message to <a href="mailto:analog-announce-request@lists.isite.net">analog-announce-request@lists.isite.net</a> with the word <kbd>subscribe</kbd> in the main body of the message. Note that the word has to be in the body of the message, not the subject. Also please note the <kbd>-request</kbd> part of the address. <p> If you want to get help with analog, please check the following simple things first. <ol> <li>Read the <a href="#faq">FAQ</a>. Maybe I've answered your question already. If I have, I'll just direct you to the FAQ, not answer it again. <li>Read the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/bugs.html">list of known bugs</a> at my site, to see if your bug is already known about. <li>Read the other relevant pages of the Readme, particularly the sections on <cite><a href="#start">Starting to use analog</a></cite> and <cite><a href="#custom">Customising analog</a></cite>. You may also find the <a href="#indx">index</a> useful. I don't appreciate people who are too lazy to read the documentation. (If the documentation is unclear, or the relevant paragraph is too well hidden, then that's a different matter. Of course I want to know about that.) <li>Have a look in the <a href="http://www.mail-archive.com/analog-help@lists.isite.net/">web archive</a> of the mailing list to see if your question has already been answered there. <li>If analog isn't doing what you thought you asked it to, then run it with the <kbd><a href="#settings">SETTINGS ON</a></kbd> configuration command, and see what options it thinks it's meant to be using. </ol> I'm sorry to be so fussy, but a lot of the mail on the list really needn't have been sent at all, and just wastes the time of everybody on the list. As I say, I really do welcome genuine mail. <p> If you still need help, write to the analog-help mailing list. First you have to subscribe (you can't send mail without subscribing) by sending a message to <a href="mailto:analog-help-request@lists.isite.net">analog-help-request@lists.isite.net</a> with the word <kbd>subscribe</kbd> in the main body of the message. (Note that the word has to be in the body of the message, not the subject.) After you've received an acknowledgement that you have subscribed, you can send mail to analog-help@lists.isite.net. Don't try and use this address for subscribing though. It won't work! <p> Please do the following when you send mail to the list. <ol> <li>Describe exactly what you did, what you expected, and what the computer did. Include the <em>exact text</em> of any error messages, not a précis. <li>Mention which version of analog you are using, on which operating system. <li>Give your mail a subject line which indicates immediately what aspect of analog it is about. (This is useful for the <a href="http://www.mail-archive.com/analog-help@lists.isite.net/">archive</a>). <li>Do <strong>not</strong> send long files or attachments unless you're asked to. We do not want to see your configuration file, your header file, your output file, or any logfile over 10 lines long. They are almost always useless to us. And anyway, excessively long messages will be rejected by the mailing list server. </ol> <p> If you want to send a private message to me, you can send it to me at <a href="mailto:analog-author@lists.isite.net">analog-author@lists.isite.net</a>. Please don't use this address for user support questions: keep them on the <kbd>analog-help</kbd> list. <p> Many thanks to <a href="http://www.isite.net/">ISite</a> for providing these mailing lists for me, and to <a href="http://www.mail-archive.com/">The Mail Archive</a> for archiving the analog-help list. <hr> <hr> <a name="helpers"><h2>Helper applications</h2> </a> Some people have written helper applications for analog. These are independent programs which work together with analog to make certain tasks easier. There are graphical configuration tools, for example, or tools which post-process analog's output to produce graphs. There are tools to do the DNS lookups more quickly, configuration files for certain jobs, and lots of other things. <p> These helper applications are all listed at the analog site. The list is growing quite quickly, so I'm not distributing it with the program. But I strongly recommend you go to the <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">analog home page</a> (or even better, to your local mirror site) and check it out. <p> There are also some example configuration files in the <kbd>examples</kbd> directory or folder distributed with the program. <hr> <hr> <a name="acknow"><h2>Acknowledgements</h2> </a> Many people have helped me with analog, and I can't thank them all specifically. But I do appreciate everyone who's given me feedback or sent me bug reports. <p> Thanks are due to the author of <a HREF="http://www.eit.com/software/getstats/">getstats</a>, Kevin Hughes. In the days before analog there were only three serious logfile analysis programs, and only one of them, getstats, had attractive output. I wrote analog when getstats stopped being able to cope with the size of our logfile, but my output still looks somewhat similar to his. <p> Thanks are also due to all those who helped in the early stages of writing this program, and gave me the encouragement to continue with analog and to release it publicly. Those who made helpful suggestions during beta testing are numerous, but I must mention particularly Dan Anderson, Martyn Johnson, Joe Ramey, Chris Ritson, Quentin Stafford-Fraser and Dave Stanworth. Above all Gareth McCaughan gave me lots of programming advice. The program would have run much more slowly without him. <p> Many people have provided mirror sites for analog, starting with Dave Stanworth (again!). The full list of mirror sites is listed <a href="http://www.statslab.cam.ac.uk/~sret1/analog/mirrors.html">elsewhere</a>; thanks to all of them. Many thanks also to <a href="http://www.isite.net/">ISite</a> for providing the mailing lists, and to <a href="http://www.mail-archive.com/">The Mail Archive</a> for archiving the analog-help list. <p> Mark Roedel first suggested porting analog to different platforms, and made the original DOS port. Shortly afterwards, Jason Linhart made the Mac port, and has continued to contribute lots of extra code for that platform and for the program in general. The Mac version also includes code contributed by Stephan Somogyi and Nigel Perry, and uses the <a href="http://www.cdrom.com/pub/infozip/zlib/">ZLib library</a> by Jean-loup Gailly & Mark Adler. Later ports were made by Dave Jones, Martin Zinser & Rick Dyson (OpenVMS), Magnus Hagander (Win32), Nick Smith (Acorn RiscOS), Scott Tadman (BeOS), and Martin Kraemer & Holger Schranz (BS2000/OSD). Ivan Martinez compiles the OS/2 version. The BS2000/OSD port includes code developed by the Apache Group for use in the <a href="http://www.apache.org/">Apache HTTP server project</a>. If <kbd>NEED_MEMMOVE</kbd> is defined at compile time, then this product includes software developed by the University of California, Berkeley and its contributors. <p> The form interface is based on an idea by James Dean Palmer. Thanks to all the other people who have contributed bits of code too: I apologise for not having room to name all of them. And thanks to those who have written <a href="#helpers">helper applications</a>, for making analog more usable. <p> For the translations into other languages, many thanks are due to the following: Tigran Nazarian (Armenian), Emir Alikadic (Bosnian), Francesc Rocher, M. Mercè Llauge & Francesc Burrull i Mestres (Catalan), Yang Meng (Simplified Chinese), Andrew Choi (Traditional Chinese), Jan Simek & Karel Fajkus (Czech), Adrian Price (Danish), Ferry van het Groenewoud, Joost Baaij & Dimitry Smagghe (Dutch), Henrik Huhtinen, Steve Kelly & Andrew Staples (Finnish), Patrice Lafont, Lucien Vieira, Jean-Marc Coursimault & Lionel Delaude (French), Mario Ellebrecht, Martin Kraemer, Holger Schranz, Thomas Jacob, Thomas Frings & Georg Schwarz (German), Dimitris Xenakis (Greek), Laszlo Nemeth (Hungarian), Gustaf Gustafsson (Icelandic), Furio Ercolessi, Luca Andreucci & Alessio Bragadini (Italian), Takayuki Matsuki, Stephen Obenski and Motonobu Takahashi (Japanese), Byungkwan Kim (Korean), Jurijs Turjanskis (Latvian), Ingrid (Lithuanian), Jan-Aage Bruvoll, Espen Bjarnø & Pål Løberg (Norwegian Bokmål), Magni Onsøien (Norwegian Nynorsk), Wlodek Lapot, Tomek Wozniak & Marcin Sochacki (Polish), Ivan Martinez (Brazilian Portuguese), Jaime Carvalho e Silva (European Portugese), Alex Mihaila (Romanian), San Sanych Timofeev, Boris Litvinenko & Vyacheslav Nikitich (Russian), Mile Peric (Serbian), Stefan Billik (Slovak), Andrej Zizmond & Dalibor Cvijetinoviè (Slovene), Javier Solis, Alexander Velasquez, Alfredo Sola, Martin Perez & Nelson Tactuk (Spanish), Björn Malmberg, Frank Osterberg & Wesley Schaal (Swedish), Nezih Erkman (Turkish), and Yaroslav Boychuk (Ukrainian). <p> Finally, thanks to all of you for using the program! <hr> <hr> <a name="whatsnew"><h2>What's new in this version?</h2> </a> This section lists the major new features in each version of analog. There's also another section about <a href="#update">how to upgrade</a> from older versions of analog, listing which commands have changed or been abolished, or how the output of this version differs from that of previous versions. <dl> <dt><b><a name="new403">4.03</a></b> (21-Feb-00) <dd>Fixed several small bugs. <br>New command <kbd><a href="#RUNTIME">RUNTIME</a></kbd>. <br>Brazilian Portuguese language files and Swedish domains files. Corrections to Dutch. <dt><b><a name="new402">4.02</a></b> (31-Jan-00) <dd>New command <kbd><a href="#SCC">SEARCHCHARCONVERT</a></kbd>. <br>Support for Apache's new <kbd>%q</kbd> code in <kbd>APACHELOGFORMAT</kbd>. <br>Fix for search reports causing crashes on Windows. <br>New language: Czech. Corrections for Serbian, Slovene and Ukrainian. <dt><b><a name="new401">4.01</a></b> (17-Dec-99) <dd>New command <kbd><a href="#CASE">USERCASE</a></kbd>. <br>Some of the default paths have changed in <kbd>anlghead.h</kbd>. <br>Improvements to OpenVMS port. <br>Language files included for Armenian, Bosnian, Catalan, traditional Chinese, Dutch, Finnish, German, Italian, Slovak, Slovene, Spanish, Swedish & Ukrainian; corrections to Russian & Turkish. <dt><b><a name="new40">4.0</a></b> (16-Nov-99) <dd>Simplified Chinese, Danish, Japanese, Portuguese & Serbian language files included. <br>Otherwise only small changes since 3.90beta2. <dt><b><a name="new390b2">3.90beta2</a></b> (02-Nov-99) <dd>It is now recommended that you don't run analog as a CGI program for <a href="#notcgi">security reasons</a>. (The <kbd>CGI</kbd> command is still present, but it is now not documented.)<br> The Organisation Report is now <a href="#hierreps">hierarchical</a>.<br> The Browser Summary is now arranged by major version number. (See <a href="#update">notes on upgrading</a>.)<br> Non-exact bytes are now given to 3 decimal places.<br> <kbd><a href="#GOTOS">GOTOS FEW</a></kbd> puts the "Go To" lines just at the top and bottom of the output.<br> <kbd>PRINTVARS</kbd> has been renamed <kbd><a href="#settings">SETTINGS</a></kbd>.<br> <kbd><a href="#settings">-settings</a></kbd> output improved, especially with <kbd>OUTPUT NONE</kbd>.<br> Split <kbd>PAGEWIDTH</kbd> into <kbd><a href="#PAGEWIDTH">HTMLPAGEWIDTH</a></kbd> and <kbd><a href="#PAGEWIDTH">ASCIIPAGEWIDTH</a></kbd>.<br> Includes language files for French, Greek, Norwegian (Bokmål & Nynorsk), Polish, Russian and Turkish.<br> New configuration file <kbd>examples/big.cfg</kbd> containing most commands. <dt><b><a name="new390b1">3.90beta1</a></b> (07-Oct-99) <dd>First beta test for version 4. The most important new features are: <ul> <li>Five new reports: Organisation Report, Operating System Report, Search Word Report, Search Query Report, Processing Time Report. <li>Browser Summary improved (will <a href="#update">change results</a>). <li><a href="#form">Form interface</a> completely rewritten, and considerably simplified. <li>Multiple *'s now allowed on left-hand side of <a href="#useraliases"><kbd>ALIAS</kbd>es</a>. <li>Regular expressions allowed in <a href="#incregexp"><kbd>INCLUDE</kbd>s & <kbd>EXCLUDE</kbd>s</a>, and <a href="#aliasregexp"><kbd>ALIAS</kbd>es</a>. <li>The <a href="#outputexcludes">output <kbd>INCLUDE</kbd>s and <kbd>EXCLUDE</kbd>s</a> now apply to the lower levels of a <a href="#hierreps">hierarchical report</a> as well as the top level. <li>New commands: <kbd><a href="#form">CGI</a></kbd>, <kbd><a href="#STYLESHEET">STYLESHEET</a></kbd> and <kbd><a href="#ERRLINELENGTH">ERRLINELENGTH</a></kbd>. <li>New <a href="#othCOLS">column <kbd>N</kbd></a> in most reports. <li><kbd><a href="#debugs">DEBUG C</a></kbd> now reports which part of a corrupt logfile line is corrupt. <li>Non-exact bytes are now displayed as, e.g., 47.68 Mbytes instead of 48,832 kbytes. This should be less confusing. <li>Timestamps added to <kbd><a href="#PROGRESSFREQ">PROGRESSFREQ</a></kbd> reports. <li>The <a href="#dns">DNS file</a> has a new time encoding. <li>Header files split up to make <kbd>anlghead.h</kbd> simpler. <li>Form interfaces in German and U.S. English included. <li>New documentation about <a href="#args">search arguments</a>. <li>New <kbd>examples</kbd> directory. <li>New <a href="Licence.txt">licence</a>. (Nearly the same, just clarified, and slightly loosened). </ul> Note: most languages don't work in this beta-test version, but should be added again by version 4. (The language files are included in the distribution, but contain lots of English strings). <dt><b><a href="#wasnew3">What was new in version 3?</a></b> <dt><b><a href="#wasnew2">What was new in version 2?</a></b> <dt><b><a href="#wasnew1">What was new in version 1?</a></b> </dl> <hr> <hr> <a name="update"><h2>Upgrading from earlier versions</h2> </a> This section lists those commands which existed in older versions of analog, but which have been changed or abolished in this version. It also lists reasons why the same input might now produce different output. The new features in this version are listed in the section <cite><a href="#whatsnew">What's new in this version?</a></cite>. <h2><a name="up40">Upgrading from 4.0 and earlier</a></h2> <ul> <li>Some of the default paths have changed in <kbd>anlghead.h</kbd>. </ul> <h2><a name="up390b1">Upgrading from 3.90beta1 and earlier</a></h2> <ul> <li>It is now recommended that you don't run analog as a CGI program, or put it in the directory with your CGI programs, for <a href="#notcgi">security reasons</a>. <li>Each browser in the Browser Summary is now sorted by major version number then minor version number. So <kbd>SUBBROW */*</kbd> will now only show the major versions. To get all the minor versions, you need <kbd>SUBBROW */*.*</kbd> <li><kbd>PAGEWIDTH</kbd> has been replaced by <kbd><a href="#PAGEWIDTH">HTMLPAGEWIDTH</a></kbd> and <kbd><a href="#PAGEWIDTH">ASCIIPAGEWIDTH</a></kbd>. <li><kbd>PRINTVARS</kbd> has been renamed <kbd><a href="#settings">SETTINGS</a></kbd>. </ul> <h2><a name="up332">Upgrading from 3.32 and earlier</a></h2> <ul> <li>The form interface has been completely rewritten, and old versions of <kbd>anlgform.html</kbd> will not work with this version. <li>The Browser Summary now diagnoses MSIE, Opera and WebTV browsers better. This will cause differences in output from previous versions. <li>With <kbd>RAWBYTES OFF</kbd>, bytes are now listed as, for example, 47.68 Mbytes instead of 48,832 kbytes. This should be less confusing. <li>The <a href="#dns">DNS file</a> has a new time encoding. It's only a few hours different, so I haven't made any special provision for it. The effect is that the <kbd><a href="#dns">DNSGOODHOURS</a></kbd> and <kbd><a href="#dns">DNSBADHOURS</a></kbd> may be a few hours out for existing entries (but not for new ones). <li>Most languages don't work in this beta version, but should be added again by version 4. (The language files are included in the distribution, but contain lots of English strings). </ul> <h2><a name="up33">Upgrading from 3.3 and earlier</a></h2> <ul> <li>There is a new set of graphics in the <kbd>images</kbd> directory, which you will have to move to your web directory. <li>In the Mac version, if a configuration file is dragged onto the analog icon, it is used instead of, not as well as, the default configuration file. </ul> <h2><a name="up32">Upgrading from 3.2 and earlier</a></h2> <ul> <li>In the <a href="#compout">computer-readable output style</a>, the line <kbd>L7</kbd>, the time the last seven days begins after, has been replaced by <kbd>E7</kbd>, the time the last seven days ends. This is for consistency with the other output styles. <li>Also in the computer-readable output, there is a new line reporting the floor and the <kbd>SORTBY</kbd> for the report. In 3.11 and earlier, this didn't exist, and in 3.2 it only reported the floor, not the <kbd>SORTBY</kbd>. <li><kbd>%R</kbd> (Mac-style filename) has been abolished in the <kbd>LOGFORMAT</kbd>. Just use plain <kbd>%r</kbd> instead. <li>It is no longer allowed to set the <kbd>CACHEOUTFILE</kbd> to be the same as a previous cache file. <li>The definition of the common log format and related formats changed between 3.11 & 3.2, and again between 3.2 & 3.3. This could cause differences in output, but they are likely to be only very minor. </ul> <h2><a name="up311">Upgrading from 3.11 and earlier</a></h2> <ul> <li>Lines without a particular item now work properly with <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands. For example, if you do an <kbd>INCLUDE</kbd> to look at only certain lines, then lines without that type of item at all will not now be included, whereas previously they would have been. This can make the results lower than in these earlier versions. <li>I have reluctantly removed support for NetPresenz logs. This hasn't worked for some time, and I have already been advising NetPresenz users not to use newer versions of analog because they could get wrong results. Unfortunately, fixing it would require a complete rewrite of the entire parsing code, which isn't going to happen any time soon. So my advice remains the same: continue to use version 2.11 or (even better) pre-process your logfiles into a form which analog can handle safely. <li>The English domains file has changed name from <kbd>domains.tab</kbd> to <kbd>ukdom.tab</kbd>. </ul> <h2><a name="up30winform">Upgrading from 3.0, Win32 form interface</a></h2> <ul> <li>If using the form interface on Windows, it is now necessary to put the analog executable at <kbd>\analog\analog.exe</kbd> instead of <kbd>\Program Files\analog\analog.exe</kbd> </ul> <h2><a name="up290b1">Upgrading from 2.90beta1</a></h2> <ul> <li><kbd>LOGFORMAT MICROSOFT</kbd> has been replaced by <kbd>LOGFORMAT MICROSOFT-NA</kbd> and <kbd>LOGFORMAT MICROSOFT-INT</kbd>; and similarly for <kbd>LOGFORMAT NETPRESENZ</kbd>. </ul> <h2><a name="up211">Upgrading from 2.11 and earlier</a></h2> <ul> <li>It is possible that there may be small discrepancies between the results from previous versions and the results from this version because the parsing code has changed, but any such differences should be minor. However... <li>If you used to use <kbd>REFEXCLUDE</kbd> or <kbd>BROWEXCLUDE</kbd>, you most likely now want <kbd><a href="#outputexcludes">REFREPEXCLUDE</a></kbd> or <kbd><a href="#outputexcludes">BROWREPEXCLUDE</a></kbd> instead, or you will exclude lots of lines that were previously included. <li>It is possible that this version may not automatically parse a logfile that previous versions could parse, because it checks more carefully that the logfile is in the format claimed. If so, you will have to use a <kbd><a href="#logfmt">LOGFORMAT</a></kbd> command. <li>Approximate host counting has been abolished, unless there's a significant demand for it. <li>Count of number of new hosts in last seven days abolished. It was too confusing because it depended on which old logfiles you analysed. <li>The Error Report has been abolished (together with the configuration commands <kbd>ERROR</kbd>, <kbd>ERRLOG</kbd> and <kbd>ERRMINOCCS</kbd>). See the <cite><a href="#whatsnew">What's new?</a></cite> page. <li>The <kbd>BROWLOG</kbd> and <kbd>REFLOG</kbd> commands have also been abolished: just use <kbd><a href="#logfile">LOGFILE</a></kbd> instead. <li>The <kbd>HASHSIZE</kbd> commands have been abolished: analog now chooses the size of the hash tables itself. <li>The <kbd>MINREQS</kbd> and similar options have been replaced by the <kbd><a href="#FLOOR">FLOOR</a></kbd> commands. <li>Only one <kbd>*</kbd> is now allowed on the left-hand side of aliases, to avoid ambiguities. <li>Automatic detection of log type is now on a per-file rather than a per-line basis. <li><kbd>ISPAGE</kbd> is now called <kbd><a href="#PAGEINCLUDE">PAGEINCLUDE</a></kbd>. <li><kbd>WITHARGS</kbd> and <kbd>REFWITHARGS</kbd> are now called <kbd><a href="#ARGSINCLUDE">ARGSINCLUDE</a></kbd> and <kbd><a href="#ARGSINCLUDE">REFARGSINCLUDE</a></kbd>. <li><kbd>MONTHLYBACK</kbd> is now called <kbd>MONTHBACK</kbd>. <li><kbd>FULLHOSTS</kbd> is now just called <kbd>HOST</kbd>. <li><kbd>LOGOURL</kbd> is now called <kbd>LOGO</kbd>. <li>The <kbd>UNIT</kbd> commands have been abolished. They weren't very useful, and they didn't make much sense with the different ways of displaying the time report bar charts. The unit is now always chosen automatically. <li><kbd>DIRLEVEL</kbd> has been abolished, because the <kbd><a href="#hierreps">SUBDIR</a></kbd> command is more general. Use <kbd>SUBDIR */*</kbd> or whatever instead. <li>Comments aren't allowed in the <a href="#domfile">domains file</a>. I don't think this should cause a problem. <li><kbd>GRAPHICAL</kbd> is abolished. Instead, use lower case letters with the <kbd>GRAPH</kbd> commands. <li><kbd>NUMLOOKUP</kbd> has been replaced by <kbd><a href="#dns">DNS</a></kbd>, and <kbd>DNSFRESHHOURS</kbd> by the commands <kbd><a href="#dns">DNSGOODHOURS and DNSBADHOURS</a></kbd>. <li>DNS cache files from previous versions are not compatible with this version. <li>You can't use <kbd>PAGES</kbd> in the columns or <kbd>SORTBY</kbd> or <kbd>FLOOR</kbd> for the Request Report. Use <kbd>REQINCLUDE pages</kbd> instead. <li><kbd>-</kbd> as a synonym for <kbd>none</kbd> has been abolished in some places, e.g., <kbd>HOSTURL</kbd>. <li>The following command line arguments have been abolished from earlier versions, many of the letters getting new meanings: <kbd>7</kbd>, <kbd>l</kbd>, <kbd>n</kbd>, <kbd>p</kbd>, <kbd>s</kbd>, <kbd>u</kbd>, <kbd>v</kbd>, <kbd>w</kbd>. (<kbd>-v</kbd> has moved to <kbd>-settings</kbd>.) Others have been changed since version 1.2 as well. </ul> <h2><a name="up20win">Upgrading from 2.0, Win32 users</a></h2> <ul> <li>Filenames for logfiles etc. should now be given DOS-style, with backslashes, rather than Unix-style with forward slashes. </ul> <h2><a name="up192mac">Upgrading from 1.92 and earlier, Mac users</a></h2> <ul> <li>There is no longer an automatic progress report. Use the <kbd><a href="#PROGRESSFREQ">PROGRESSFREQ</a></kbd> command instead. </ul> <h2><a name="up19b">Upgrading from 1.9beta's</a></h2> <ul> <li>Use <kbd><a href="#outputexcludes">REQINCLUDE</a></kbd> and <kbd><a href="#LINKINCLUDE">LINKINCLUDE</a></kbd> instead of <kbd>REQTYPE</kbd> and <kbd>PAGELINKS</kbd>. </ul> <h2><a name="up12">Upgrading from 1.2's and earlier</a></h2> <ul> <li>Use <a href="#include"><kbd>*INCLUDE</kbd> and <kbd>*EXCLUDE</kbd></a> instead of <kbd>*ONLY</kbd> and <kbd>*IGNORE</kbd>. <li>The syntax of the <kbd>*FLOOR</kbd> commands has changed. <li>Use <kbd>*SORTBY REQUESTS</kbd> or <kbd>BYTES</kbd> instead of <kbd>*SORTBY BYREQUESTS</kbd> or <kbd>BYBYTES</kbd>. </ul> <hr> <hr> <a name="wasnew3"><h2>What was new in version 3?</h2> </a> This section lists the new features which were in version 3 of analog. <dl> <dt><b><a href="#whatsnew">What's new in version 4?</a></b> <dt><b><a name="new332">3.32</a></b> (02-Sep-99) <dd>Bug fixes, including: <ul> <li>Drag-and-drop on Mac now works. <li>Unsafe characters in hyperlinks now escaped. <li>One bug that caused crashes when printing deep Directory Reports fixed. </ul> New VMS build scripts. Let me know of any compilation problems. <br>Computer-readable output now reports version of analog used. <br>Improved some diagnostic messages. <br>New language Serbo-Croatian; new domains files for Italian and Russian; corrected Polish language files. <br>New documentation on <cite><a href="#reports">Analog's reports</a></cite> and <cite><a href="#quickref">Quick reference</a></cite>. <br>Now uses named anchors throughout the documentation, so that cross-references link to the right part of a page. <dt><b><a name="new331">3.31</a></b> (19-Jun-99) <dd>New command <kbd><a href="#BARSTYLE">BARSTYLE</a></kbd>; you will need to <a href="#up33">use new images</a>. <br>Russian language file corrected. <br>Some bug fixes, including one important one correcting cache file output. <dt><b><a name="new33">3.3</a></b> (19-May-99) <dd>New commands <kbd><a href="#ERRFILE">ERRFILE</a></kbd>, <kbd><a href="#dns">DNSLOCKFILE</a></kbd>, <kbd><a href="#Apache">APACHELOGFORMAT</a></kbd> and <kbd><a href="#DEFAULTLOGFORMAT">APACHEDEFAULTLOGFORMAT</a></kbd>. <br>Can include the date in the name of the <kbd><a href="#OUTFILE">OUTFILE</a></kbd> and the <kbd><a href="#cache">CACHEOUTFILE</a></kbd>. <br>Support for WebSite logfiles. <br>New token <kbd>%U</kbd> in <a href="#fmtstrings">log formats</a> for "Unix time" (seconds since 1970). <br>Won't overwrite old cache files. <br>Now works properly on SunOS 4. <br>Fix for occasional crashes on Windows. <br>Checks language files are not too long. <br>"Last seven days" data now calculated more accurately and displayed more clearly. <br>Computer-readable output now reports <kbd>SORTBY</kbd>'s as well as floors. <br>Revised Makefile will work with older make's. <br>Corrected Catalan language files. <br>Includes form interfaces in French and Japanese. <br><kbd>LOGFORMAT</kbd> documentation now includes the <a href="#fmtexamples"><kbd>LOGFORMAT</kbd> commands</a> for all built-in log formats. <dt><b><a name="new32">3.2</a></b> (04-May-99) <dd>Bug fixes: in particular <kbd>REFLINKINCLUDE pages</kbd> now works; and cache files now include all items even if they're not wanted for the main report. <br>Lines without a particular item now work properly with <kbd>INCLUDE</kbd> and <kbd>EXCLUDE</kbd> commands. This can cause <a href="#up311">differences in results</a> from previous versions. <br>New version of form interface to work round bug in Microsoft Internet Information Server. <br>New command <kbd><a href="#NOROBOTS">NOROBOTS</a></kbd>. <br>Backslashes are now coerced to forward slashes in filenames and usernames. While not always correct technically, it usually is in practice, and it makes them behave correctly in other parts of the program. <br>Usernames are now treated as case insensitive. Let me know if this causes a problem on any system. <br>Computer-readable output style now reports floors. <br>Rewritten Unix Makefile, and VMS build script. Let me know of any compilation problems. <br>New languages: Catalan, Icelandic, Japanese, Korean, Latvian, Lithuanian. Corrected Spanish language files and French domains file. <br><kbd>LANGUAGE</kbd> now selects local domains file automatically, where available. <br>Removed support for NetPresenz logs. The reasons are in the section on <a href="#up311">how to upgrade</a>. <br><a href="#form">Form interface documentation</a> rewritten; <a href="#faq">FAQ</a> broken into sections; sections on <a href="#logfile">logfiles</a> and <a href="#logfmt">log formats</a> separated and rewritten; new section on <a href="#helpers">helper applications</a>; and dozens of other improvements to the documentation. <dt><b><a name="new311">3.11</a></b> (26-Nov-98) <dd>Bug fix version. <br>Microsoft's attempt at W3 extended format is now understood even if there is a second <tt>#Fields:</tt> line in the logfile. <br>There is also a fix for a new Microsoft bug which results in an non-standard common format. <br>Intermittent crashes under Windows fixed. <br>Mailing lists announced. <dt><b><a name="new31">3.1</a></b> (17-Oct-98) <dd>Understands Microsoft's attempt at W3 extended format. <br>Several bugs fixed, including one that caused occasional crashes and one that caused the output to grow and grow. <br>Form interface works on Windows. <br>Allows aliases with two or more *'s on left hand side, if right hand side contains no *'s. <br>Aliases work properly with <kbd>CASE INSENSITIVE</kbd>. <br>Numerical <kbd>SUBDOMAIN</kbd>s fixed. <br>Understands more WebSTAR and Netscape tokens. <br>Accents in domains file work. <br><kbd>LOGFORMAT</kbd> removed from form interface as security risk. <br>Several warning messages improved. <br>Report aliases and in/exclusions shown in <kbd>settings</kbd> output. <br>Character set declared at top of output. <br>Spanish, Dutch, Norwegian (Bokmål and Nynorsk), Finnish, Turkish, Greek, Polish, Russian & Chinese language files included. <dt><b><a name="new30">3.0</a></b> (15-Jun-98) <dd>Corrected W3 extended format. <br>Fix for broken <kbd>strcmp()</kbd> function on SunOS 5. <br>Portuguese, Brazilian Portuguese, Danish and Hungarian language files included. <br>Precompiled executable for OS/2 available. <dt><b><a name="new291b1">2.91beta1</a></b> (04-Jun-98) <dd>Form interface included. <br>Uses less memory when compiling reports. <br>New operating system, BS2000/OSD, and code for EBCDIC character set. <br>New command <kbd><a href="#DEFAULTLOGFORMAT">DEFAULTLOGFORMAT</a></kbd>. <br><kbd><a href="#LASTSEVEN">LASTSEVEN</a></kbd> and <kbd><a href="#BASEURL">BASEURL</a></kbd> reinstated. <br>More information added to <kbd>PRINTVARS</kbd> output. <br>AppleScript support for Unix-style command lines added to Mac version. <br>Now works on SunOS 4, and other small bug fixes. <br>French, German, Swedish, Czech, Slovak, Slovene and Romanian language files included. <br>One page version of the Readme included in the documentation. <dt><b><a name="new290b4">2.90beta4</a></b> (09-Apr-98) <dd>Mended DNS cache file reading, which I broke in yesterday's release. <dt><b><a name="new290b3">2.90beta3</a></b> (08-Apr-98) <dd>Fixed bug that caused a crash while giving warning messages on SunOS; bug that caused configuration files that called other configuration files not to be completed; and other smaller bugs. <br>Italian language files included. <dt><b><a name="new290b2">2.90beta2</a></b> (03-Apr-98) <dd>Separate <kbd>LOGFORMAT</kbd>s for North American and international date formats, when using Microsoft or Netpresenz logs. <br>Understands the AppleShare IP server's attempt at the WebSTAR format. <br>Directory report now works properly even if you use the second argument to the <kbd><a href="#secondarg">LOGFILE</a></kbd> command. <br>Wild cards in filenames work properly on the Mac. <br>Other small bug fixes. <br>One speed improvement (I gain about 3%). <br>Several corrections and clarifications to the documentation. <dt><b><a name="new290b1">2.90beta1</a></b> (27-Mar-98) <dd>This version is a completely rewritten version. Every single line of code is new. The whole code is shorter despite considerable improvements in functionality. Several people have reported that it is significantly faster. The most important new features are: <ul> <li>Eleven new reports (Quarter-Hour, Five-Minute, Redirection, Failure, File Size, Referring Site, Redirected Referrer, Failed Referrer, Virtual Host, User, User Failure). <li>Reads logfiles in user-customisable format. <li>Analyses user and virtual host data, and failed requests. <li>Hierarchical reports list subdirectories under directories, and allow analysis of browser version numbers. <li>Faster sorting of long reports. <li>Floor and sort method made independent. <li>"Last date" column in reports, and can floor and sort by date. <li>Busiest time period at bottom of time reports. <li>"Not listed" line at bottom of other report. <li>Knows HTTP/1.1 status codes. <li>General Summary can go anywhere in the report. <li>General Summary and "Go To"s can now be turned on and off independently. <li>Status Code Report can be sorted in different ways. <li><a href="#TIMEOFFSET">Time offset</a> commands. <li>Much better checking of invalid configuration options and invalid logfile lines. <li>Only reads logfiles it might need. <li>Improvements in DNS functionality: can now read the DNS file without further lookups: also, separate recheck intervals for successful and failed lookups. <li>Hash sizes now chosen automatically. <li>More flexible language support. <li>Mac version reads gzipped logfiles. <li>Mac version supports drag-and-drop onto program icon. <li>Readme files completely re-written. Broken into lots of files, and new sections on <cite><a href="#start">Starting to use analog</a></cite> and <cite><a href="#meaning">What the results mean</a></cite>, as well as an <a href="#indx">index</a>. </ul> <a name="abolished290b1">The following features</a> have been abolished. <ul> <li>No Error Report. The error log was always intended for humans rather than computers to read. Moreover, its format varied from server to server, and even between different versions of the same server. The place of the Error Report has largely been taken by the new reports, particularly the Failure Report. <li>The approximate host counting has been abolished for the time being. I can put it back if there is a significant demand for it. <li>Only one <kbd>*</kbd> can now appear on the left-hand side of aliases. This is to avoid ambiguities. <li>For changes in the names and syntax of configuration options and command line arguments, see the section about <a href="#update">upgrading</a>. </ul> The following features are not yet present, but will be added by version 3. <ul> <li>The form interface. <li>Most of the languages. </ul> <dt><b><a href="#wasnew2">What was new in version 2?</a></b> <dt><b><a href="#wasnew1">What was new in version 1?</a></b> </dl> <hr> <hr> <a name="wasnew2"><h2>What was new in version 2?</h2> </a> This section lists the new features which were in version 2 of analog. <dl> <dt><b><a href="#whatsnew">What's new in version 4?</a></b> <dt><b><a href="#wasnew3">What was new in version 3?</a></b> <dt><b>2.11</b> (14-Mar-97) <dd>Minor bug fixes to yesterday's release. <dt><b>2.1</b> (13-Mar-97) <dd>Language support rewritten, causing reduction in code size of 2200 lines. <br>New configuration command <kbd>LANGFILE</kbd>. <br>New Acorn RiscOS version. <br>Page requests per day reported. <br>Bug fix: <kbd>CASE INSENSITIVE</kbd> could cause <kbd>%7E</kbd>-type conversions not to take place. <dt><b>2.0.2</b> (04-Mar-97) <dd>DNS lookups and wildcards should now work in the Win32 version. <br>New configuration command <kbd>PRINTVARS</kbd>. <br>Fix for zero length hostnames after DNS lookups. <br>Minor corrections in French and Spanish translations. <dt><b>2.0</b> (10-Feb-97) <dd>New native Win32 version. <br>Wildcards allowed in filenames on Mac. <br>Ignores browser "-". <dt><b>1.93beta</b> (18-Jan-97) <dd>New commands <kbd>BROWALIAS</kbd>, <kbd>CONFIGFILE</kbd> and <kbd>PROGRESSFREQ</kbd>. <br>Form program can now call configuration files. <br>Form program now uses the default choices if none specified. <br>Domain report prints correctly in preformatted output. <br>Specifying +1 and +V2 doesn't crash the program. <br>-v reports dates correctly. <br>Trailing dots on hostnames removed. <br>Second argument to <kbd>LOGFILE</kbd> command can't be obliterated by <kbd>/../</kbd> <dt><b>1.92beta</b> (08-Oct-96) <dd>DNS lookups added on Mac. <br>Netpresenz format understood on Mac. <br>New languages: Spanish, Italian and Danish. <br>Extra information when debugging turned on. <br><code>*.htm</code> are now pages on all machines. <br>A few small bugs fixed. <dt><b>1.91beta4</b> (13-Jul-96) <dd>Cache file now includes page request information. <br>DNS bug fixed. <br>New command <kbd>DNSHASHSIZE</kbd>. <br>Bug in browser reports fixed. <dt><b>1.91beta3</b> (09-Jul-96) <dd>BSD/OS compilation bug believed fixed. <br>Fixed <kbd>HOSTALIAS</kbd> which I broke yesterday. <br>DNS bug (causing too many lookups) identified, although not yet fixed. <dt><b>1.91beta2</b> (08-Jul-96) <dd>Some bug fixes (including: <kbd>HOSTEXCLUDE</kbd> and <kbd>CASE INSENSITIVE</kbd> didn't work properly; selecting "no links" failed on the form; less fussy about what can appear on the form). <br>Mac version no longer includes source code, so is much shorter. <dt><b>1.91beta1</b> (05-Jul-96) <dd>Now DNS code doesn't look up a name twice, even if one is a failed request. <dt><b>1.91beta</b> (05-Jul-96) <dd>Will now output in any of several languages. <br>Preformatted output introduced. <br>New File Type Report. <br>Can limit the number of rows in the time reports. <br>Number of requests for pages (as opposed to raw requests) now calculated throughout. <br>DNS lookup returns, with cacheing across runs. <br>Logfiles can include wildcards. <br>Wildcards can include multiple *'s. <br>Can process case insensitive logfiles. <br><kbd>OUTPUTALIAS</kbd> commands introduced. <br>New commands to specify exactly what is included, and what linked, in the request report and referrer report. <br><tt>FILEALIAS a a</tt> and <tt>FILEALIAS a b; FILEALIAS b c</tt> now work. <br>New <kbd>ALLOW</kbd> options to cancel <kbd>INCLUDES</kbd>. <br><kbd>REPSEPCHAR</kbd> and <kbd>DECPOINT</kbd> introduced. <br><kbd>DIRSUFFIX</kbd> introduced. <br>Debugging reports number of corrupt lines in other logs. <br>Hash sizes can now be allocated at run time. <br>stdin can now be used for any input file, but not for two. <br>Macintosh version now quits automatically if no warnings have been issued. <br>Form interface made more secure. <br>"Mozilla (compatible)" separated out in Browser Summary. <br>Major internal changes should improve speed. <br>Code for non-Unix platforms integrated into main code. <br>"Referrer" spelled correctly. <br>Licence introduced. <br>Update file introduced. <br>Readme updated to include non-Unix instructions. <dt><b>(19-Apr-96)</b> <dd>First Mac version. <dt><b>1.9beta6</b> <dd>Two bug fixes (number of bytes was incorrectly reported in some cases, and <kbd>-v</kbd> would overwrite the <kbd>OUTFILE</kbd>). <br>Documentation improved. <dt><b>1.9beta5</b> <dd>More bug fixes... <dt><b>1.9beta4</b> <dd>One important bug fix (I broke <kbd>GRAPHICAL OFF</kbd> in 1.9beta3). <br>New form cgi options: <kbd>ch</kbd>, <kbd>gr</kbd> and <kbd>ou=3</kbd>. <br>Code shortened. <dt><b>(05-Mar-96)</b> <dd>First DOS version. <dt><b>1.9beta3</b> <dd>Mainly bug fixes and improved documentation. <br>Browser and referer reports now include failed requests. <br>The <kbd>WARNINGS</kbd> option can now be specified on the form. <dt><b>1.9beta2</b> <dd>Small bug fixes <dt><b>1.9beta</b> (06-Feb-96) <dd>Lots of changes. The most important new features are <ul> <li>Six new reports (hourly report, browser report, browser summary, referer report, status code report and error report). <li>Analysis of NCSA/Apache referer log, agent log and combined log formats. <li>Graphical time reports that still work on text-based browsers. <li>Configurable columns in the time reports. <li>Time reports can run backwards. <li>Time graphs can be plotted by bytes instead of by requests. <li>Can cache old data so that old logfiles need not be kept. <li>Can process several logfiles. <li>Can combine logfiles from several different hosts. <li>Will uncompress compressed logfiles. <li>All configuration options can now be specified on the commandline. <li>Mandatory configuration file added. <li>Lots of new options in the form processing program. <li>Wildcards greatly improved throughout. <li>Alphabetical host report right-aligned. <li>Bytes now quoted as MBytes etc. instead of long number. <li>Produces HTML2.0 compliant output. <li>New sort method <kbd>RANDOM</kbd> (saves time for long reports). <li>Floors for reports now work properly. <li>Can now specify a report <kbd>FROM</kbd> 100 or more days ago. <li>Option to turn off warnings. <li>Considerable savings in code length over previous versions. </ul> <dt><b><a href="#wasnew1">What was new in version 1?</a></b> </dl> <hr> <hr> <a name="wasnew1"><h2>What was new in version 1?</h2> </a> This section lists the new features which were in version 1 of analog. <dl> <dt><b><a href="#whatsnew">What's new in version 4?</a></b> <dt><b><a href="#wasnew3">What was new in version 3?</a></b> <dt><b><a href="#wasnew2">What was new in version 2?</a></b> <dt><b>1.2.6</b> <dd>Minor bug fix; will only affect those with corrupt logfiles. <dt><b>1.2.5</b> <dd>Minor bug fix for weekly report. <dt><b>1.2.4</b> <dd>Patch for Spyglass server logfile format. <dt><b>1.2.3</b> <dd>A couple of bug fixes (wild subdomains sometimes caused crashes). <br><kbd>-v</kbd> option now gives the version number. <dt><b>1.2.2</b> <dd>Patch for proxy servers: <kbd>http://</kbd> not translated to <kbd>http:/</kbd> <dt><b>1.2</b> (11-Nov-95) <dd>Can configure columns in reports to give percentage requests and number of bytes. <br>Wild subdomains (e.g., *.com). <br>Nameless subdomains. <br>Subdomains now listed in alphabetical order. <br>Proper support for numerical hostnames in <kbd>HOSTIGNORE</kbd>, <kbd>HOSTONLY</kbd>, <kbd>SUBDOMAIN</kbd> and alphabetical sorting. <br>New <kbd>BASEURL</kbd> command allowing statistics to be displayed on other servers. <br>Output always says how things are sorted. <br>"Last 7 days" now behaves sensibly with <kbd>TO</kbd>. <br>Filenames containing <kbd>/../</kbd>, <kbd>/./</kbd> and <kbd>//</kbd> translated. <br>Header and footer options removed from form (for security reasons). <dt><b>1.1</b> (02-Oct-95) <dd>Form interface introduced. <br>ASCII output now possible as well as HTML. <br>Output file can now be specified in the configuration file. <br><kbd>FROM</kbd> and <kbd>TO</kbd> commands more powerful. <br><kbd>DEBUG</kbd> and <kbd>BACKGROUND</kbd> introduced. <br>One bug fix: alphabetical sorting doesn't now swap some hostnames. <br>List of primes included in distribution. <dt><b>1.0</b> (12-Sep-95) <dd>Only minor changes since 0.94beta. <dt><b>0.94beta</b> (30-Aug-95) <dd>New configuration variables <kbd>SEPCHAR</kbd> and <kbd>REPORTORDER</kbd>. <br>New configuration commands <kbd>WITHARGS</kbd> and <kbd>WITHOUTARGS</kbd>. <br>New commandline options <kbd>+-A</kbd> and <kbd>+-x</kbd>. (Config.: <kbd>ALL</kbd> and <kbd>GENERAL</kbd>). <br>Logfile entries with - as the return code are now regarded as successes, not corrupt entries. <br>Fixed bugs in host report when aliases or numerical hosts are present. <br>Documentation rewritten. <dt><b>0.93beta</b> (27-Jul-95) <dd>Approximate hostname counting now possible in fixed memory. <br>New configuration commands <kbd>ISPAGE</kbd> and <kbd>ISNOTPAGE</kbd>. <br>New commandline option <kbd>-v</kbd>. <br>New configuration command <kbd>WEEKBEGINSON</kbd>. <br>Proper error message when memory exceeded. <br>Program split into several files. <dt><b>0.92beta</b> (11-Jul-95) <dd>New reports introduced: hostname, full daily, and weekly. <br><kbd>FROM</kbd> and <kbd>TO</kbd> commands introduced. <br>Header and footer files introduced. <br>More helpful warning messages. <br>Ability to read configuration instructions from stdin. <br>Subdomain commands moved from domains file to configuration file. <br>Makefile provided. <dt><b>0.91beta</b> (04-Jul-95) <dd>Configuration file introduced, enabling many new options. <br>Some bug fixes and speed improvements. <br>Ability to print "top n" reports (rather than "everything higher than n"). <br>Request report can print only pages. <br>Ability to try and resolve numerical addresses. <br>Now less fussy about the format of the domains file. <br>Logo added. <br>Readme converted to HTML. <dt><b>0.9beta</b> <dd>More speed improvements, and some bug fixes. <br>Introduced <kbd>-u</kbd> option. <br>Introduced subdomain analysis. <br>Included "not modified" replies as successes, not redirects. <br>First public release at 0.9beta3. (29-Jun-95) <dt><b>0.89beta</b> (21-Jun-95) <dd>Commandline arguments. <br>Efficiency improvements. <br>Host count and "last 7 day" statistics. <dt><b>0.8beta</b> (14-Jun-95) <dd>Initial program, just default options. </dl> <hr> <hr> <a name="quickref"><h2>Quick reference</h2> </a> This section is list of all of analog's configuration commands, together with a quick reference to their syntax and some examples. It's designed for those who are already familiar with the program, so it's pretty useless for trying to learn the program: to learn about the commands, read the section on <cite><a href="#custom">Customising analog</a></cite> instead, or consult the <a href="#indx">index</a> for a reference. I would <a href="#mailing">welcome feedback</a> on this new section. <p> This section is divided into the following parts: <ul> <li><a href="#quicknot">Notation</a> <li><a href="#quickfiles">Input and output files</a> <li><a href="#quickfmt"><kbd>LOGFORMAT</kbd> commands</a> <li><a href="#quickalias"><kbd>ALIAS</kbd> commands</a> <li><a href="#quickinclude"><kbd>INCLUDE/EXCLUDE</kbd> commands</a> <li><a href="#quickdns">DNS commands</a> <li><a href="#quicksub">Sub-item commands</a> <li><a href="#quicklowmem"><kbd>LOWMEM</kbd> commands</a> <li><a href="#quickrep">Report commands</a> <li><a href="#quickgraph"><kbd>GRAPH</kbd> commands</a> <li><a href="#quickback"><kbd>BACK</kbd> commands</a> <li><a href="#quickrows"><kbd>ROWS</kbd> commands</a> <li><a href="#quickcols"><kbd>COLS</kbd> commands</a> <li><a href="#quicksortby"><kbd>SORTBY</kbd> commands</a> <li><a href="#quickfloor"><kbd>FLOOR</kbd> commands</a> <li><a href="#quicklinks">Hyperlinks</a> <li><a href="#quicklang">Language commands</a> <li><a href="#quickcosmetic">Cosmetic and miscellaneous commands</a> <li><a href="#quickdebug">Diagnostics</a> </ul> <h3><a name="quicknot">Notation</a></h3> The syntax for each command is given using the following notation. <pre> "stuff" the word stuff x y x followed by y (x | y) x or y [x] optional x subset("...") any letters from the string, in any order perm("...") all the letters from the string, in any order *x x may contain wildcards * and ? (and often comma-separated list) x := y x is defined to be y COMMAND the command under discussion </pre> In addition, I use the following names for different types of argument. <pre> char a single character string a string digit a digit number a non-negative integer (i.e. a string of digits) real a non-negative real number regexp a POSIX extended regular expression file a filename within your server's filespace; e.g. /index.html localfile a filename within your system's filespace; e.g. /usr/local/analog.html localfmtfile as localfile, but may contain <a href="#OUTFILE">date codes</a>; e.g. /usr/local/analog%y%M.html referrer a URL of a referring page; e.g. http://search.yahoo.com/ URL a URL which may be absolute, or relative to the output page; e.g. images/ or /~fred/images/ or http://www.fred.com/images/ </pre> <p>Note: I have occasionally opted for clarity above strict accuracy where I don't think it will cause any confusion! <p>The syntax for commands in general was given <a href="#syntax">earlier</a>: remember that an argument which contains a hash or a space must be put in quotes or parentheses. <h3><a name="quickfiles">Input and output files</a></h3> <dl> <dt><i>Syntax</i> <dd><kbd> <a href="#logfile">LOGFILE</a> (*localfile | "-" | "none") [prefix_string] <br><a href="#OUTFILE">OUTFILE</a> (localfmtfile | "-" | "none") <br><a href="#cache">CACHEFILE</a> (*localfile | "-" | "none") <br><a href="#cache">CACHEOUTFILE</a> (localfmtfile | "-" | "none") <br><a href="#UNCOMPRESS">UNCOMPRESS</a> *localfile program </kbd> <dt><i>Examples</i> <dd><kbd> LOGFILE /httpd/logs/* <br>LOGFILE c:\logs\log1,c:\logs\log2 <br>OUTFILE "Hard Disk:Report%Y%M.html" <br>UNCOMPRESS *.gz "/usr/bin/gzip -cd"</kbd> </dl> <h3><a name="quickfmt"><kbd>LOGFORMAT</kbd> commands</a></h3> <dl> <dt><i>Syntax</i> <dd><pre> format_string := (<a href="#fmtstrings">see documentation</a>) Apache_format_string := (see <a href="http://www.apache.org/docs/mod/mod_log_config.html">Apache documentation</a>) logformat := ("COMMON" | "COMBINED" | "REFERRER" | "BROWSER" | "EXTENDED" | "MICROSOFT-NA" | "MICROSOFT-INT" | "WEBSITE-NA" | "WEBSITE-INT" | "MS-EXTENDED" | "MS-COMMON" | "NETSCAPE" | "WEBSTAR" | "AUTO" | format_string) <a href="#logfmt">LOGFORMAT</a> logformat <a href="#DEFAULTLOGFORMAT">DEFAULTLOGFORMAT</a> logformat <a href="#Apache">APACHELOGFORMAT</a> Apache_format_string <a href="#DEFAULTLOGFORMAT">APACHEDEFAULTLOGFORMAT</a> Apache_format_string </pre> <dt><i>Notes</i> <dd><kbd>LOGFORMAT</kbd> and <kbd>APACHELOGFORMAT</kbd> only affect logfiles occurring later in the same configuration file. <dt><i>Examples</i> <dd><kbd> LOGFORMAT (%S - %u [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b) <br>DEFAULTLOGFORMAT MS-EXTENDED <br>APACHELOGFORMAT (%h %l %u %t \"%r\" %s %b)</kbd> </dl> <h3><a name="quickalias"><kbd>ALIAS</kbd> commands</a></h3> <dl> <dt><i>1. Commands (items)</i> <dd><kbd> <a href="#useraliases">FILEALIAS</a>, <a href="#useraliases">HOSTALIAS</a>, <a href="#useraliases">BROWALIAS</a>, <a href="#useraliases">REFALIAS</a>, <a href="#useraliases">USERALIAS</a>, <a href="#useraliases">VHOSTALIAS</a></kbd> <dt><i>Syntax</i> <dd><kbd> COMMAND *olditem newitem <br>COMMAND ("REGEXP:" | "REGEXPI:")regexp newitem</kbd> <dt><i>Notes</i> <dd>Aliases item in all reports. Items with the same resultant name are combined. <kbd>newitem</kbd> may contain <kbd>$1</kbd>, <kbd>$2</kbd> etc., representing the <kbd>*</kbd>'s in <kbd>olditem</kbd> or the bracketed subexpressions in <kbd>regexp</kbd>. Regular expressions are only available on some platforms. <dt><i>Examples</i> <dd><kbd>FILEALIAS /*/football/* /$1/soccer/$2</kbd> <dd><kbd>USERALIAS REGEXP:^([^U].*) U$1</kbd> <p><dt><i>2. Commands (reports)</i> <dd><kbd> <a href="#OUTPUTALIAS">TYPEOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">HOSTOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">REQOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">REDIROUTPUTALIAS</a>, <a href="#OUTPUTALIAS">FAILOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">DIROUTPUTALIAS</a>, <a href="#OUTPUTALIAS">DOMOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">ORGOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">REFOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">REFSITEOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">REDIRREFOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">FAILREFOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">BROWOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">FULLBROWOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">OSOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">VHOSTOUTPUTALIAS</a>, <a href="#OUTPUTALIAS">USEROUTPUTALIAS</a>, <a href="#OUTPUTALIAS">FAILUSEROUTPUTALIAS</a></kbd> <dt><i>Syntax</i> <dd><kbd> COMMAND *item string <br>COMMAND ("REGEXP:" | "REGEXPI:")regexp string</kbd> <dt><i>Notes</i> <dd>Aliases item on one line of one report only. <kbd>string</kbd> may contain <kbd>$1</kbd>, <kbd>$2</kbd> etc., representing the <kbd>*</kbd>'s in <kbd>item</kbd> or the bracketed subexpressions in <kbd>regexp</kbd>. Regular expressions are only available on some platforms. <dt><i>Examples</i> <dd><kbd>REQOUTPUTALIAS /football/ "/football/ (Main football page)"</kbd> <dd><kbd>REFOUTPUTALIAS REGEXP:^(http://([^/]*\.)?(maths|stats)\.uxy\.edu.*) ([$3] $1)</kbd> <p><dt><i>3. Other commands: syntax</i> <dd><kbd> <a href="#CASE">CASE</a> ("SENSITIVE" | "INSENSITIVE") <br><a href="#CASE">USERCASE</a> ("SENSITIVE" | "INSENSITIVE") <br><a href="#SCC">SEARCHCHARCONVERT</a> ("ON" | "OFF") <br><a href="#DIRSUFFIX">DIRSUFFIX</a> suffix <br><a href="#TIMEOFFSET">LOGTIMEOFFSET</a> ["+" | "-"] number <br><a href="#TIMEOFFSET">TIMEOFFSET</a> ["+" | "-"] number</kbd> <dt><i>Examples</i> <dd><kbd> CASE SENSITIVE <br>DIRSUFFIX index.htm <br>LOGTIMEOFFSET -300 </kbd> </dl> <h3><a name="quickinclude"><kbd>INCLUDE/EXCLUDE</kbd> commands</a></h3> <dl> <dt><i>1. Commands (items)</i> <dd><kbd> <a href="#include">FILEINCLUDE</a>, <a href="#include">FILEEXCLUDE</a>, <a href="#include">HOSTINCLUDE</a>, <a href="#include">HOSTEXCLUDE</a>, <a href="#include">BROWINCLUDE</a>, <a href="#include">BROWEXCLUDE</a>, <a href="#include">REFINCLUDE</a>, <a href="#include">REFEXCLUDE</a>, <a href="#include">USERINCLUDE</a>, <a href="#include">USEREXCLUDE</a>, <a href="#include">VHOSTINCLUDE</a>, <a href="#include">VHOSTEXCLUDE</a></kbd> <dt><i>Syntax</i> <dd><kbd> COMMAND (*item | "") <br>COMMAND ("REGEXP:" | "REGEXPI:")regexp</kbd> <dt><i>Notes</i> <dd>Excludes all logfile entries containing an excluded item from all reports. Includes and excludes are done after aliases, so the <kbd>item</kbd> is the aliased name, if applicable. Regular expressions are only available on some platforms. <dt><i>Examples</i> <dd><kbd> FILEINCLUDE /jim/* <br>FILEINCLUDE REGEXP:^/~[^/]*/$ <br>HOSTEXCLUDE proxy*.aol.com <br>USEREXCLUDE "" </kbd> <p><dt><i>2. Syntax (including and excluding dates)</i> <dd><kbd> partdate := ["+" | "-"] digit digit <br>date := partdate partdate partdate [":" partdate partdate] <br><a href="#FROMTO">FROM</a> date <br><a href="#FROMTO">TO</a> date</kbd> <dt><i>Examples</i> <dd><kbd> FROM 990719:1200 <br>TO -00-0101 </kbd> <p><dt><i>3. Commands (reports)</i> <dd><kbd> <a href="#outputexcludes">REQINCLUDE</a>, <a href="#outputexcludes">REQEXCLUDE</a>, <a href="#outputexcludes">REDIRINCLUDE</a>, <a href="#outputexcludes">REDIREXCLUDE</a>, <a href="#outputexcludes">FAILINCLUDE</a>, <a href="#outputexcludes">FAILEXCLUDE</a>, <a href="#outputexcludes">TYPEINCLUDE</a>, <a href="#outputexcludes">TYPEEXCLUDE</a>, <a href="#outputexcludes">DIRINCLUDE</a>, <a href="#outputexcludes">DIREXCLUDE</a>, <a href="#outputexcludes">HOSTREPINCLUDE</a>, <a href="#outputexcludes">HOSTREPEXCLUDE</a>, <a href="#outputexcludes">DOMINCLUDE</a>, <a href="#outputexcludes">DOMEXCLUDE</a>, <a href="#outputexcludes">ORGINCLUDE</a>, <a href="#outputexcludes">ORGEXCLUDE</a>, <a href="#outputexcludes">REFREPINCLUDE</a>, <a href="#outputexcludes">REFREPEXCLUDE</a>, <a href="#outputexcludes">REFSITEINCLUDE</a>, <a href="#outputexcludes">REFSITEEXCLUDE</a>, <a href="#outputexcludes">SEARCHQUERYINCLUDE</a>, <a href="#outputexcludes">SEARCHQUERYEXCLUDE</a>, <a href="#outputexcludes">SEARCHWORDINCLUDE</a>, <a href="#outputexcludes">SEARCHWORDEXCLUDE</a>, <a href="#outputexcludes">REDIRREFINCLUDE</a>, <a href="#outputexcludes">REDIRREFEXCLUDE</a>, <a href="#outputexcludes">FAILREFINCLUDE</a>, <a href="#outputexcludes">FAILREFEXCLUDE</a>, <a href="#outputexcludes">BROWSUMINCLUDE</a>, <a href="#outputexcludes">BROWSUMEXCLUDE</a>, <a href="#outputexcludes">FULLBROWINCLUDE</a>, <a href="#outputexcludes">FULLBROWEXCLUDE</a>, <a href="#outputexcludes">OSINCLUDE</a>, <a href="#outputexcludes">OSEXCLUDE</a>, <a href="#outputexcludes">VHOSTREPINCLUDE</a>, <a href="#outputexcludes">VHOSTREPEXCLUDE</a>, <a href="#outputexcludes">USERREPINCLUDE</a>, <a href="#outputexcludes">USERREPEXCLUDE</a>, <a href="#outputexcludes">FAILUSERINCLUDE</a>, <a href="#outputexcludes">FAILUSEREXCLUDE</a></kbd> <dt><i>Syntax</i> <dd><kbd> COMMAND *item <br>COMMAND ("REGEXP:" | "REGEXPI:")regexp</kbd> <dt><i>Notes</i> <dd>Excludes an excluded item from one report only. Regular expressions are only available on some platforms. <dt><i>Example</i> <dd><kbd>REQINCLUDE pages</kbd> <p><dt><i>4. Syntax (miscellaneous)</i> <dd><kbd> <a href="#PAGEINCLUDE">PAGEINCLUDE</a> *file <br><a href="#PAGEINCLUDE">PAGEEXCLUDE</a> *file <br><a href="#ARGSINCLUDE">ARGSINCLUDE</a> *file <br><a href="#ARGSINCLUDE">ARGSEXCLUDE</a> *file <br><a href="#ARGSINCLUDE">REFARGSINCLUDE</a> *referrer <br><a href="#ARGSINCLUDE">REFARGSEXCLUDE</a> *referrer</kbd> <dt><i>Notes</i> <dd>These can be regular expressions too, on suitable platforms. <dt><i>Example</i> <dd><kbd> PAGEINCLUDE *.asp </kbd> </dl> <h3><a name="quickdns">DNS commands</a></h3> <dl> <dt><i>Syntax</i> <dd><kbd> <a href="#dns">DNSFILE</a> localfile <br><a href="#dns">DNS</a> ("NONE" | "READ" | "LOOKUP" | "WRITE") <br><a href="#dns">DNSLOCKFILE</a> localfile <br><a href="#dns">DNSGOODHOURS</a> number <br><a href="#dns">DNSBADHOURS</a> number</kbd> <dt><i>Examples</i> <dd><kbd> DNSFILE dnscache.txt <br>DNS WRITE <br>DNSBADHOURS 48 </kbd> </dl> <h3><a name="quicksub">Sub-item commands</a></h3> <dl> <dt><i>Syntax</i> <dd><kbd> <a href="#hierreps">SUBDIR</a> *file <br><a href="#hierreps">SUBDOMAIN</a> *subdomain <br><a href="#hierreps">SUBORG</a> *subdomain <br><a href="#hierreps">SUBTYPE</a> *extension <br><a href="#hierreps">SUBBROW</a> *browser <br><a href="#hierreps">REFDIR</a> *referrer</kbd> <dt><i>Examples</i> <dd><kbd> SUBDIR /jim/*/* <br>SUBTYPE *.gz </kbd> </dl> <h3><a name="quicklowmem"><kbd>LOWMEM</kbd> commands</a></h3> <dl> <dt><i>Commands</i> <dd><kbd><a href="#lowmem">FILELOWMEM</a>, <a href="#lowmem">HOSTLOWMEM</a>, <a href="#lowmem">BROWLOWMEM</a>, <a href="#lowmem">REFLOWMEM</a>, <a href="#lowmem">USERLOWMEM</a>, <a href="#lowmem">VHOSTLOWMEM</a></kbd> <dt><i>Syntax</i> <dd><kbd> COMMAND ("0" | "1" | "2" | "3")</kbd> <dt><i>Example</i> <dd><kbd> HOSTLOWMEM 3 </kbd> </dl> <h3><a name="quickrep">Report commands</a></h3> <dl> <dt><i>Commands</i> <dd><kbd><a href="#replist">GENERAL</a>, <a href="#replist">ALL</a>, <a href="#replist">MONTHLY</a>, <a href="#replist">WEEKLY</a>, <a href="#replist">FULLDAILY</a>, <a href="#replist">DAILY</a>, <a href="#replist">FULLHOURLY</a>, <a href="#replist">HOURLY</a>, <a href="#replist">QUARTER</a>, <a href="#replist">FIVE</a>, <a href="#replist">HOST</a>, <a href="#replist">ORGANISATION</a>, <a href="#replist">DOMAIN</a>, <a href="#replist">REQUEST</a>, <a href="#replist">DIRECTORY</a>, <a href="#replist">FILETYPE</a>, <a href="#replist">SIZE</a>, <a href="#replist">PROCTIME</a>, <a href="#replist">REDIR</a>, <a href="#replist">FAILURE</a>, <a href="#replist">REFERRER</a>, <a href="#replist">REFSITE</a>, <a href="#replist">SEARCHQUERY</a>, <a href="#replist">SEARCHWORD</a>, <a href="#replist">REDIRREF</a>, <a href="#replist">FAILREF</a>, <a href="#replist">FULLBROWSER</a>, <a href="#replist">BROWSER</a>, <a href="#replist">OSREP</a>, <a href="#replist">VHOST</a>, <a href="#replist">USER</a>, <a href="#replist">FAILUSER</a>, <a href="#replist">STATUS</a></kbd> <dt><i>Syntax</i> <dd><kbd> REPORTCOMMAND ("ON" | "OFF")</kbd> <dt><i>Examples</i> <dd><kbd> ALL OFF <br>FULLHOURLY ON </kbd> </dl> <h3><a name="quickgraph"><kbd>GRAPH</kbd> commands</a></h3> <dl> <dt><i>Commands</i> <dd><kbd><a href="#GRAPH">ALLGRAPH</a>, <a href="#GRAPH">MONTHGRAPH</a>, <a href="#GRAPH">WEEKGRAPH</a>, <a href="#GRAPH">DAYGRAPH</a>, <a href="#GRAPH">FULLDAYGRAPH</a>, <a href="#GRAPH">HOURGRAPH</a>, <a href="#GRAPH">FULLHOURGRAPH</a>, <a href="#GRAPH">QUARTERGRAPH</a>, <a href="#GRAPH">FIVEGRAPH</a></kbd> <dt><i>Syntax</i> <dd><kbd> COMMAND ("R" | "r" | "P" | "p" | "B" | "b")</kbd> <dt><i>Example</i> <dd><kbd> ALLGRAPH B </kbd> </dl> <h3><a name="quickback"><kbd>BACK</kbd> commands</a></h3> <dl> <dt><i>Commands</i> <dd><kbd><a href="#BACK">ALLBACK</a>, <a href="#BACK">MONTHBACK</a>, <a href="#BACK">WEEKBACK</a>, <a href="#BACK">FULLDAYBACK</a>, <a href="#BACK">FULLHOURBACK</a>, <a href="#BACK">QUARTERBACK</a>, <a href="#BACK">FIVEBACK</a></kbd> <dt><i>Syntax</i> <dd><kbd> COMMAND ("ON" | "OFF")</kbd> <dt><i>Example</i> <dd><kbd> ALLBACK ON </kbd> </dl> <h3><a name="quickrows"><kbd>ROWS</kbd> commands</a></h3> <dl> <dt><i>Commands</i> <dd><kbd><a href="#ROWS">MONTHROWS</a>, <a href="#ROWS">WEEKROWS</a>, <a href="#ROWS">FULLDAYROWS</a>, <a href="#ROWS">FULLHOURROWS</a>, <a href="#ROWS">QUARTERROWS</a>, <a href="#ROWS">FIVEROWS</a></kbd> <dt><i>Syntax</i> <dd><kbd> COMMAND number</kbd> <dt><i>Example</i> <dd><kbd> QUARTERROWS 192 </kbd> </dl> <h3><a name="quickcols"><kbd>COLS</kbd> commands</a></h3> <dl> <dt><i>1. Commands (time reports)</i> <dd><kbd> <a href="#timeCOLS">TIMECOLS</a>, <a href="#timeCOLS">MONTHCOLS</a>, <a href="#timeCOLS">WEEKCOLS</a>, <a href="#timeCOLS">DAYCOLS</a>, <a href="#timeCOLS">FULLDAYCOLS</a>, <a href="#timeCOLS">HOURCOLS</a>, <a href="#timeCOLS">FULLHOURCOLS</a>, <a href="#timeCOLS">QUARTERCOLS</a>, <a href="#timeCOLS">FIVECOLS</a></kbd> <dt><i>Syntax</i> <dd><kbd>cols1 := subset("RrPpBb") <br>COMMAND cols1</kbd> <dt><i>Example</i> <dd><kbd> MONTHCOLS bRP </kbd> <p><dt><i>2. Commands (most success reports)</i> <dd><kbd> <a href="#othCOLS">HOSTCOLS</a>, <a href="#othCOLS">ORGCOLS</a>, <a href="#othCOLS">DOMCOLS</a>, <a href="#othCOLS">DIRCOLS</a>, <a href="#othCOLS">REFCOLS</a>, <a href="#othCOLS">REFSITECOLS</a>, <a href="#othCOLS">SEARCHQUERYCOLS</a>, <a href="#othCOLS">SEARCHWORDCOLS</a>, <a href="#othCOLS">FULLBROWCOLS</a>, <a href="#othCOLS">BROWCOLS</a>, <a href="#othCOLS">OSCOLS</a>, <a href="#othCOLS">VHOSTCOLS</a>, <a href="#othCOLS">USERCOLS</a></kbd> <dt><i>Syntax</i> <dd><kbd>cols2 := subset("NDRrPpBb") <br>COMMAND cols2</kbd> <dt><i>Example</i> <dd><kbd> USERCOLS BD </kbd> <p><dt><i>3. Commands (Request and File Type Reports)</i> <dd><kbd> <a href="#othCOLS">REQCOLS</a>, <a href="#othCOLS">TYPECOLS</a></kbd> <dt><i>Syntax</i> <dd><kbd>cols3 := subset("NDRrpBb") <br>COMMAND cols3</kbd> <dt><i>Example</i> <dd><kbd> TYPECOLS NRb </kbd> <p><dt><i>4. Commands (failure, redirection and Status Code reports)</i> <dd><kbd> <a href="#othCOLS">REDIRCOLS</a>, <a href="#othCOLS">FAILCOLS</a>, <a href="#othCOLS">REDIRREFCOLS</a>, <a href="#othCOLS">FAILREFCOLS</a>, <a href="#othCOLS">FAILUSERCOLS</a>, <a href="#othCOLS">STATUSCOLS</a></kbd> <dt><i>Syntax</i> <dd><kbd>cols4 := subset("NDRr") <br>COMMAND cols4</kbd> <dt><i>Example</i> <dd><kbd> FAILCOLS D </kbd> <p><dt><i>5. Commands (Size and Processing Time Reports)</i> <dd><kbd> <a href="#othCOLS">SIZECOLS</a>, <a href="#othCOLS">PROCTIMECOLS</a></kbd> <dt><i>Syntax</i> <dd><kbd>cols5 := subset("DRrPpBb") <br>COMMAND cols5</kbd> <dt><i>Example</i> <dd><kbd> SIZECOLS RB </kbd> </dl> <h3><a name="quicksortby"><kbd>SORTBY</kbd> commands</a></h3> <dl> <dt><i>1. Commands (most success reports)</i> <dd><kbd> <a href="#SORTBY">HOSTSORTBY</a>, <a href="#SORTBY">ORGSORTBY</a>, <a href="#SORTBY">DOMSORTBY</a>, <a href="#SORTBY">DIRSORTBY</a>, <a href="#SORTBY">REFSORTBY</a>, <a href="#SORTBY">REFSITESORTBY</a>, <a href="#SORTBY">SEARCHQUERYSORTBY</a>, <a href="#SORTBY">SEARCHWORDSORTBY</a>, <a href="#SORTBY">FULLBROWSORTBY</a>, <a href="#SORTBY">BROWSORTBY</a>, <a href="#SORTBY">OSSORTBY</a>, <a href="#SORTBY">VHOSTSORTBY</a>, <a href="#SORTBY">USERSORTBY</a>, <a href="#SUBSORTBY">SUBDIRSORTBY</a>, <a href="#SUBSORTBY">SUBDOMSORTBY</a>, <a href="#SUBSORTBY">SUBORGSORTBY</a>, <a href="#SUBSORTBY">SUBBROWSORTBY</a>, <a href="#ARGSSORTBY">SUBOSSORTBY</a>, <a href="#SUBSORTBY">REFDIRSORTBY</a>, <a href="#ARGSSORTBY">REFARGSSORTBY</a></kbd> <dt><i>Syntax</i> <dd><kbd>sortby1 := ("REQUESTS" | "PAGES" | "BYTES" | "DATE" | "ALPHABETICAL" | "RANDOM") <br>COMMAND sortby1</kbd> <dt><i>Example</i> <dd><kbd> DOMSORTBY ALPHABETICAL </kbd> <p><dt><i>2. Commands (Request and File Type Reports)</i> <dd><kbd> <a href="#SORTBY">REQSORTBY</a>, <a href="#SORTBY">TYPESORTBY</a>, <a href="#ARGSSORTBY">REQARGSSORTBY</a>, <a href="#SUBSORTBY">SUBTYPESORTBY</a></kbd> <dt><i>Syntax</i> <dd><kbd>sortby2 := ("REQUESTS" | "BYTES" | "DATE" | "ALPHABETICAL" | "RANDOM") <br>COMMAND sortby2</kbd> <dt><i>Example</i> <dd><kbd> REQSORTBY REQUESTS </kbd> <p><dt><i>3. Commands (failure, redirection and Status Code reports)</i> <dd><kbd> <a href="#SORTBY">REDIRSORTBY</a>, <a href="#SORTBY">FAILSORTBY</a>, <a href="#SORTBY">REDIRREFSORTBY</a>, <a href="#SORTBY">FAILREFSORTBY</a>, <a href="#SORTBY">FAILUSERSORTBY</a>, <a href="#SORTBY">STATUSSORTBY</a>, <a href="#ARGSSORTBY">REDIRARGSSORTBY</a>, <a href="#ARGSSORTBY">FAILARGSSORTBY</a>, <a href="#ARGSSORTBY">REDIRREFARGSSORTBY</a>, <a href="#ARGSSORTBY">FAILREFARGSSORTBY</a></kbd> <dt><i>Syntax</i> <dd><kbd>sortby3 := ("REQUESTS" | "DATE" | "ALPHABETICAL" | "RANDOM") <br>COMMAND sortby3</kbd> <dt><i>Example</i> <dd><kbd> FAILSORTBY DATE </kbd> </dl> <h3><a name="quickfloor"><kbd>FLOOR</kbd> commands</a></h3> <dl> <dt><i>Commands (top-level)</i> <dd><kbd> <a href="#FLOOR">HOSTFLOOR</a>, <a href="#FLOOR">ORGFLOOR</a>, <a href="#FLOOR">DOMFLOOR</a>, <a href="#FLOOR">REQFLOOR</a>, <a href="#FLOOR">DIRFLOOR</a>, <a href="#FLOOR">TYPEFLOOR</a>, <a href="#FLOOR">REDIRFLOOR</a>, <a href="#FLOOR">FAILFLOOR</a>, <a href="#FLOOR">REFFLOOR</a>, <a href="#FLOOR">REFSITEFLOOR</a>, <a href="#FLOOR">SEARCHQUERYFLOOR</a>, <a href="#FLOOR">SEARCHWORDFLOOR</a>, <a href="#FLOOR">REDIRREFFLOOR</a>, <a href="#FLOOR">FAILREFFLOOR</a>, <a href="#FLOOR">FULLBROWFLOOR</a>, <a href="#FLOOR">BROWFLOOR</a>, <a href="#FLOOR">OSFLOOR</a>, <a href="#FLOOR">VHOSTFLOOR</a>, <a href="#FLOOR">USERFLOOR</a>, <a href="#FLOOR">FAILUSERFLOOR</a>, <a href="#FLOOR">STATUSFLOOR</a></kbd> <dt><i>Commands (lower levels)</i> <dd><kbd> <a href="#ARGSFLOOR">REQARGSFLOOR</a>, <a href="#ARGSFLOOR">REDIRARGSFLOOR</a>, <a href="#ARGSFLOOR">FAILARGSFLOOR</a>, <a href="#ARGSFLOOR">REFARGSFLOOR</a>, <a href="#ARGSFLOOR">REDIRREFARGSFLOOR</a>, <a href="#ARGSFLOOR">FAILREFARGSFLOOR</a>, <a href="#SUBFLOOR">SUBDIRFLOOR</a>, <a href="#SUBFLOOR">SUBDOMFLOOR</a>, <a href="#SUBFLOOR">SUBORGFLOOR</a>, <a href="#SUBFLOOR">SUBTYPEFLOOR</a>, <a href="#SUBFLOOR">SUBBROWFLOOR</a>, <a href="#ARGSFLOOR">SUBOSFLOOR</a>, <a href="#SUBFLOOR">REFDIRFLOOR</a></kbd> <dt><i>Syntax</i> <dd><kbd> partdate := ["+" | "-"] digit digit <br>date := partdate partdate partdate [":" partdate partdate] <br>COMMAND number ("r" | "p") <br>COMMAND number ["k" | "M" | "G" | "T"] "b" <br>COMMAND real ("%" | ":") ("r" | "p" | "b") <br>COMMAND date "d" <br>COMMAND "-" number ("r" | "p" | "b" | "d")</kbd> <dt><i>Notes</i> <dd>Actually, this syntax isn't quite correct. <kbd>REQFLOOR</kbd>, <kbd>TYPEFLOOR</kbd>, <kbd>REQARGSFLOOR</kbd> and <kbd>SUBTYPEFLOOR</kbd> aren't allowed to be of type <kbd>"p"</kbd>; and <kbd>REDIRFLOOR</kbd>, <kbd>FAILFLOOR</kbd>, <kbd>REDIRREFFLOOR</kbd>, <kbd>FAILREFFLOOR</kbd>, <kbd>FAILUSERFLOOR</kbd>, <kbd>STATUSFLOOR</kbd>, <kbd>REDIRARGSFLOOR</kbd>, <kbd>FAILARGSFLOOR</kbd>, <kbd>REDIRREFARGSFLOOR</kbd> and <kbd>FAILREFARGSFLOOR</kbd> aren't allowed to be of types <kbd>"p"</kbd> or <kbd>"b"</kbd>. <dt><i>Examples</i> <dd><kbd> TYPEFLOOR -20r <br>REQARGSFLOOR 0.1%b </kbd> </dl> <h3><a name="quicklinks">Hyperlinks</a></h3> <dl> <dt><i>Syntax</i> <dd><kbd> <a href="#LINKINCLUDE">LINKINCLUDE</a> *file <br><a href="#LINKINCLUDE">LINKEXCLUDE</a> *file <br><a href="#LINKINCLUDE">REFLINKINCLUDE</a> *referrer <br><a href="#LINKINCLUDE">REFLINKEXCLUDE</a> *referrer <br><a href="#BASEURL">BASEURL</a> prefix_string</kbd> <dt><i>Examples</i> <dd><kbd> LINKINCLUDE pages <br>REFLINKINCLUDE *.cgi <br>BASEURL http://www.mycompany.com </kbd> </dl> <h3><a name="quicklang">Language commands</a></h3> <dl> <dt><i>Syntax</i> <dd><pre> <a href="#LANGUAGE">LANGUAGE</a> ("ARMENIAN" | "BOSNIAN" | "CATALAN" | "SIMP-CHINESE" | "TRAD-CHINESE" | "CZECH" | "DANISH" | "DUTCH" | "ENGLISH" | "US-ENGLISH" | "FINNISH" | "FRENCH" | "GERMAN" | "GREEK" | "ITALIAN" | "JAPANESE" | "NORWEGIAN" | "NYNORSK" | "POLISH" | "PORTUGUESE" | "BR-PORTUGUESE" | "RUSSIAN" | "SERBIAN" | "SLOVAK" | "SLOVENE" | "SPANISH" | "SWEDISH" | "TURKISH" | "UKRAINIAN") <a href="#LANGUAGE">LANGFILE</a> localfile <a href="#domfile">DOMAINSFILE</a> localfile </pre> <dt><i>Notes</i> <dd><a href="#LANGUAGE">Other languages</a> were available in <a href="http://www.statslab.cam.ac.uk/~sret1/analog/">version 3</a> of analog, and should be available for version 4 soon. <dt><i>Examples</i> <dd><kbd> LANGUAGE ITALIAN <br>LANGFILE lang/hindi.lng </kbd> </dl> <h3><a name="quickcosmetic">Cosmetic and miscellaneous commands</a></h3> <dl> <dt><i>Syntax</i> <dd><kbd> <br><a href="#outstyle">OUTPUT</a> ("HTML" | "ASCII" | "COMPUTER" | "NONE") <br><a href="#GOTOS">GOTOS</a> ("ON" | "OFF" | "FEW") <br><a href="#RUNTIME">RUNTIME</a> ("ON" | "OFF") <br><a href="#LASTSEVEN">LASTSEVEN</a> ("ON" | "OFF") <br><a href="#REPORTORDER">REPORTORDER</a> perm("xcmdDhH45WriSoEItzsfKkuJvbB") <br><a href="#IMAGEDIR">IMAGEDIR</a> URL <br><a href="#NOROBOTS">NOROBOTS</a> ("ON" | "OFF") <br><a href="#LOGO">LOGO</a> (URL | "none") <br><a href="#HOSTNAME">HOSTNAME</a> string <br><a href="#HOSTNAME">HOSTURL</a> (URL | "none") <br><a href="#HEADERFILE">HEADERFILE</a> (localfile | "none") <br><a href="#HEADERFILE">FOOTERFILE</a> (localfile | "none") <br><a href="#STYLESHEET">STYLESHEET</a> (URL | "none") <br><a href="#SEPCHAR">SEPCHAR</a> (char | "none") <br><a href="#SEPCHAR">REPSEPCHAR</a> (char | "none") <br><a href="#SEPCHAR">DECPOINT</a> char <br><a href="#compout">COMPSEP</a> string <br><a href="#RAWBYTES">RAWBYTES</a> ("ON" | "OFF") <br><a href="#PAGEWIDTH">HTMLPAGEWIDTH</a> number <br><a href="#PAGEWIDTH">ASCIIPAGEWIDTH</a> number <br><a href="#BARSTYLE">BARSTYLE</a> ("a" | "b" | "c" | "d" | "e" | "f" | "g" | "h") <br><a href="#MARKCHAR">MARKCHAR</a> char <br><a href="#MINGRAPHWIDTH">MINGRAPHWIDTH</a> number <br><a href="#WEEKBEGINSON">WEEKBEGINSON</a> ("SUNDAY" | "MONDAY" | "TUESDAY" | "WEDNESDAY" | "THURSDAY" | "FRIDAY" | "SATURDAY") <br><a href="#SEARCHENGINE">SEARCHENGINE</a> *referrer comma-separated-strings </kbd> <dt><i>Examples</i> <dd>Too many to list. See the documentation on each individual command. </dl> <h3><a name="quickdebug">Diagnostics</a></h3> <dl> <dt><i>Syntax</i> <dd><kbd> <a href="#settings">SETTINGS</a> ("ON" | "OFF") <br><a href="#debugs">DEBUG</a> ("ON" | "OFF" | ["+" | "-"] subset("CDFSU")) <br><a href="#WARNINGS">WARNINGS</a> ("ON" | "OFF" | ["+" | "-"] subset("CDEFLMR")) <br><a href="#PROGRESSFREQ">PROGRESSFREQ</a> number <br><a href="#ERRFILE">ERRFILE</a> localfile <br><a href="#ERRLINELENGTH">ERRLINELENGTH</a> number</kbd> <dt><i>Examples</i> <dd><kbd> DEBUG ON <br>DEBUG CF <br>WARNINGS -DL <br>PROGRESSFREQ 50000 </kbd> </dl> <hr> <hr> <a name="indx"><h2>Index</h2> </a> [ <a href="#A">A</a> | <a href="#B">B</a> | <a href="#C">C</a> | <a href="#D">D</a> | <a href="#E">E</a> | <a href="#F">F</a> | <a href="#G">G</a> | <a href="#H">H</a> | <a href="#I">I</a> | J | K | <a href="#L">L</a> | <a href="#M">M</a> | <a href="#N">N</a> | <a href="#O">O</a> | <a href="#P">P</a> | <a href="#Q">Q</a> | <a href="#R">R</a> | <a href="#S">S</a> | <a href="#T">T</a> | <a href="#U">U</a> | <a href="#V">V</a> | <a href="#W">W</a> | X | <a href="#Y">Y</a> | Z ] <p> This is the index for this Readme. Follow the numbers after each name to find references to that command or concept. Note that families of commands are indexed under the second part of the name: for example, <kbd>HOSTEXCLUDE</kbd> is under <kbd>*EXCLUDE</kbd>, not under <kbd>HOST</kbd>. <p> This index includes all of analog's configuration commands: if a command you used in previous versions is not here, see the section on <cite><a href="#update">Upgrading from earlier versions</a></cite>. All commands are also listed in the <cite><a href="#quickref">Quick reference</a></cite> with their syntax and examples, and that section is not cross-referenced from this index. <p><a name="A">Acknowledgements</a> [<a href="#acknow">1</a>] <br>Addresses, numerical [<a href="#dns">1</a>] <br><kbd>*ALIAS</kbd> [<a href="#alias">1</a>] <br>Aliases [<a href="#alias">1</a>] <br><kbd>ALL</kbd> [<a href="#ONOFF">1</a>] <br><kbd>ALLBACK</kbd> [<a href="#BACK">1</a>] <br><kbd>ALLGRAPH</kbd> [<a href="#GRAPH">1</a>] <br><kbd>analog.cfg</kbd> [<a href="#startmac">1</a>][<a href="#startpc">2</a>][<a href="#startos2">3</a>][<a href="#startux">4</a>][<a href="#specialcfgs">5</a>] <br><kbd>anlgform.html</kbd> [<a href="#form">1</a>] <br><kbd>anlgform.pl</kbd> [<a href="#form">1</a>] <br><kbd>anlghead.h</kbd> [<a href="#startux">1</a>][<a href="#syntax">2</a>] <br>Announcements [<a href="#mailing">1</a>] <br><kbd>APACHEDEFAULTLOGFORMAT</kbd> [<a href="#DEFAULTLOGFORMAT">1</a>] <br><kbd>APACHELOGFORMAT</kbd> [<a href="#Apache">1</a>] <br><kbd>ARGSEXCLUDE</kbd> [<a href="#ARGSINCLUDE">1</a>] <br><kbd>*ARGSFLOOR</kbd> [<a href="#ARGSFLOOR">1</a>] <br><kbd>ARGSINCLUDE</kbd> [<a href="#ARGSINCLUDE">1</a>] <br><kbd>*ARGSSORTBY</kbd> [<a href="#ARGSSORTBY">1</a>] <br>Arguments in URLs [<a href="#args">1</a>][<a href="#ARGSFLOOR">2</a>] <br>ASCII output [<a href="#outstyle">1</a>] <br><kbd>ASCIIPAGEWIDTH</kbd> [<a href="#PAGEWIDTH">1</a>] <br><kbd><a name="B">*BACK</a></kbd> [<a href="#BACK">1</a>] <br>Bar charts [<a href="#timereps">1</a>] <br><kbd>BARSTYLE</kbd> [<a href="#BARSTYLE">1</a>] <br><kbd>BASEURL</kbd> [<a href="#BASEURL">1</a>] <br>Basic commands [<a href="#basiccmd">1</a>] <br>Broken pipe [<a href="#brokenpipe">1</a>][<a href="#UNCOMPRESS">2</a>] <br><kbd>BROW*</kbd> commands - see under second part of name <br><kbd>BROWSER</kbd> [<a href="#replist">1</a>] <br>Browser Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br>Browser Summary [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>] <br><kbd>BROWSUM*</kbd> commands - see under second part of name <br>Bugs, reporting [<a href="#mailing">1</a>] <br>Bytes, how displayed [<a href="#RAWBYTES">1</a>] <br><a name="C">Cache files</a> [<a href="#cache">1</a>] <br><kbd>CACHEOUTFILE</kbd> [<a href="#cache">1</a>] <br><kbd>CACHEFILE</kbd> [<a href="#cache">1</a>] <br><kbd>CASE</kbd> [<a href="#CASE">1</a>] <br>CGI program [<a href="#form">1</a>] <br>"Click-thru"s [<a href="#defns">1</a>] <br>Colours [<a href="#STYLESHEET">1</a>] <br><kbd>*COLS</kbd> [<a href="#timeCOLS">1</a>][<a href="#othCOLS">2</a>] <br><a name="clargs">Command line arguments</a> [<a href="#syntax">1</a>][<a href="#startpc">2</a>][<a href="#startos2">3</a>][<a href="#startux">4</a>] <ul> <li>logfile name (<kbd>LOGFILE</kbd>) [<a href="#logfile">1</a>] <li><kbd>-</kbd> (<kbd>LOGFILE stdin</kbd>) [<a href="#logfile">1</a>] <li><kbd>4</kbd> (Quarter-Hour Report) [<a href="#replist">1</a>] <li><kbd>5</kbd> (Five-Minute Report) [<a href="#replist">1</a>] <li><kbd>A</kbd> (All reports) [<a href="#replist">1</a>] <li><kbd>a</kbd> (HTML/ASCII output) [<a href="#outstyle">1</a>] <li><kbd>B</kbd> (Browser Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>b</kbd> (Browser Summary) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>c</kbd> (Status Code Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>C</kbd> (Arbitrary configuration command) [<a href="#plusC">1</a>] <li><kbd>D</kbd> (Daily Report) [<a href="#replist">1</a>] <li><kbd>d</kbd> (Daily Summary) [<a href="#replist">1</a>] <li><kbd>E</kbd> (Redirection Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>F</kbd> (<kbd>FROM</kbd> date) [<a href="#FROMTO">1</a>] <li><kbd>f</kbd> (Referrer Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>G</kbd> (Default configuration file) [<a href="#specialcfgs">1</a>] <li><kbd>g</kbd> (Other configuration files) [<a href="#CONFIGFILE">1</a>] <li><kbd>H</kbd> (Hourly Report) [<a href="#replist">1</a>] <li><kbd>h</kbd> (Hourly Summary) [<a href="#replist">1</a>] <li><kbd>I</kbd> (Failure Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>i</kbd> (Directory Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>J</kbd> (Failed User Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>K</kbd> (Failed Referrer Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>k</kbd> (Redirected Referrer Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>m</kbd> (Monthly Report) [<a href="#replist">1</a>] <li><kbd>N</kbd> (Search Query Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>n</kbd> (Search Word Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>O</kbd> (Output file) [<a href="#OUTFILE">1</a>] <li><kbd>o</kbd> (Domain Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>P</kbd> (Processing Time Report) [<a href="#replist">1</a>] <li><kbd>p</kbd> (Operating System Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>q</kbd> (Warnings) [<a href="#WARNINGS">1</a>] <li><kbd>r</kbd> (Request Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>S</kbd> (Host Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>s</kbd> (Referring Site Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>settings</kbd> (Settings of all variables) [<a href="#settings">1</a>][<a href="#debug">2</a>] <li><kbd>T</kbd> (<kbd>TO</kbd> date) [<a href="#FROMTO">1</a>] <li><kbd>t</kbd> (File Type Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>U</kbd> (Cache file) [<a href="#cache">1</a>] <li><kbd>u</kbd> (User Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>V</kbd> (Debugging) [<a href="#debugs">1</a>] <li><kbd>v</kbd> (Virtual Host Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>W</kbd> (Weekly Report) [<a href="#replist">1</a>] <li><kbd>X</kbd> (Goto's) [<a href="#GOTOS">1</a>] <li><kbd>x</kbd> (General Summary) [<a href="#replist">1</a>] <li><kbd>Z</kbd> (Organisation Report) [<a href="#replist">1</a>][<a href="#othclarg">2</a>] <li><kbd>z</kbd> (File Size Report) [<a href="#replist">1</a>] </ul> <br>Compilation problems [<a href="#startux">1</a>] <br>Compiling [<a href="#startux">1</a>] <br>Compressed logfiles [<a href="#UNCOMPRESS">1</a>] <br><kbd>COMPSEP</kbd> [<a href="#compout">1</a>] <br>Computer-readable output style [<a href="#compout">1</a>] <br><kbd>CONFIGFILE</kbd> [<a href="#CONFIGFILE">1</a>] <br>Configuration files [<a href="#startmac">1</a>][<a href="#startpc">2</a>][<a href="#startos2">3</a>][<a href="#startux">4</a>][<a href="#syntax">5</a>] <br>Configuration file, default [<a href="#specialcfgs">1</a>] <br>Configuration file, mandatory [<a href="#specialcfgs">1</a>] <br>Contents [<a href="#map">1</a>] <br>Contributors [<a href="#acknow">1</a>] <br>Cookies [<a href="#fmtstrings">1</a>] <br>Corrupt logfile lines, definition [<a href="#defns">1</a>] <br>Countries [<a href="#domfile">1</a>] <br>Crashes [<a href="#errors">1</a>] <br>Customising analog [<a href="#custom">1</a>] <br><kbd><a name="D">DAILY</a></kbd> [<a href="#replist">1</a>] <br>Daily Report [<a href="#reptime">1</a>][<a href="#replist">2</a>][<a href="#timereps">3</a>] <br>Daily Summary [<a href="#reptimesum">1</a>][<a href="#replist">2</a>][<a href="#timereps">3</a>] <br>Date reports [<a href="#reptime">1</a>][<a href="#timereps">2</a>] <br>Dates, restricting [<a href="#FROMTO">1</a>] <br><kbd>DAY*</kbd> commands - see under second part of name <br>Debugging [<a href="#debug">1</a>] <br><kbd>DECPOINT</kbd> [<a href="#SEPCHAR">1</a>] <br>Default configuration file [<a href="#specialcfgs">1</a>] <br>Default logfile format [<a href="#DEFAULTLOGFORMAT">1</a>] <br><kbd>DEFAULTLOGFORMAT</kbd> [<a href="#DEFAULTLOGFORMAT">1</a>] <br>Definitions [<a href="#defns">1</a>] <br><kbd>DIR*</kbd> commands - see under second part of name <br><kbd>DIRECTORY</kbd> [<a href="#replist">1</a>] <br>Directory Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br><kbd>DIRSUFFIX</kbd> [<a href="#DIRSUFFIX">1</a>] <br><kbd>DNS</kbd> [<a href="#dns">1</a>] <br>DNS lookups [<a href="#dns">1</a>] <br><kbd>DNSBADHOURS</kbd> [<a href="#dns">1</a>] <br><kbd>DNSFILE</kbd> [<a href="#dns">1</a>] <br><kbd>DNSGOODHOURS</kbd> [<a href="#dns">1</a>] <br><kbd>DNSLOCKFILE</kbd> [<a href="#dns">1</a>] <br><kbd>DOM*</kbd> commands - see under second part of name <br><kbd>DOMAIN</kbd> [<a href="#replist">1</a>] <br>Domain Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>][<a href="#domfile">5</a>] <br>Domains file [<a href="#domfile">1</a>] <br><kbd>DOMAINSFILE</kbd> [<a href="#domfile">1</a>] <br><kbd><a name="E">ERRFILE</a></kbd> [<a href="#ERRFILE">1</a>] <br><kbd>ERRLINELENGTH</kbd> [<a href="#ERRLINELENGTH">1</a>] <br>error_log [<a href="#designfaq">1</a>][<a href="#abolished290b1">2</a>] <br>Error Report [<a href="#abolished290b1">1</a>] <br>Errors [<a href="#errors">1</a>] <br>Example reports [<a href="#Readme">1</a>] <br>Examples of each command [<a href="#quickref">1</a>] <br><kbd>*EXCLUDE</kbd> [<a href="#include">1</a>] <br>Exclusions [<a href="#include">1</a>] <br><kbd><a name="F">FAIL*</a></kbd> commands - see under second part of name <br>Failed Referrer Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br>Failed requests, definition [<a href="#defns">1</a>] <br>Failed User Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>] <br><kbd>FAILREF</kbd> [<a href="#replist">1</a>] <br><kbd>FAILREF*</kbd> commands - see under second part of name <br><kbd>FAILURE</kbd> [<a href="#replist">1</a>] <br>Failure Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br><kbd>FAILUSER</kbd> [<a href="#replist">1</a>] <br><kbd>FAILUSER*</kbd> commands - see under second part of name <br>FAQ [<a href="#faq">1</a>] <br>Fatal errors [<a href="#errors">1</a>] <br><kbd>FILE*</kbd> commands - see under second part of name <br>File, definition [<a href="#defns">1</a>] <br>File Size Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>] <br>File Type Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br><kbd>FILETYPE</kbd> [<a href="#replist">1</a>] <br>Filters [<a href="#include">1</a>] <br>First day of week [<a href="#WEEKBEGINSON">1</a>] <br><kbd>FIVE</kbd> [<a href="#replist">1</a>] <br><kbd>FIVE*</kbd> commands - see under second part of name <br>Five-Minute Report [<a href="#reptime">1</a>][<a href="#replist">2</a>][<a href="#timereps">3</a>] <br><kbd>*FLOOR</kbd> [<a href="#FLOOR">1</a>][<a href="#SUBFLOOR">2</a>][<a href="#ARGSFLOOR">3</a>] <br><kbd>FOOTERFILE</kbd> [<a href="#HEADERFILE">1</a>] <br>Form interface [<a href="#form">1</a>] <br>Frequently Asked Questions [<a href="#faq">1</a>] <br><kbd>FROM</kbd> [<a href="#FROMTO">1</a>] <br><kbd>FULLBROW*</kbd> commands - see under second part of name <br><kbd>FULLBROWSER</kbd> [<a href="#replist">1</a>] <br><kbd>FULLDAILY</kbd> [<a href="#replist">1</a>] <br><kbd>FULLDAY*</kbd> commands - see under second part of name <br><kbd>FULLHOUR*</kbd> commands - see under second part of name <br><kbd>FULLHOURLY</kbd> [<a href="#replist">1</a>] <br><kbd><a name="G">GENERAL</a></kbd> [<a href="#replist">1</a>] <br>General Summary [<a href="#repgen">1</a>][<a href="#replist">2</a>] <br><kbd>GOTOS</kbd> [<a href="#replist">1</a>] <br><kbd>*GRAPH</kbd> [<a href="#GRAPH">1</a>] <br>Graphs [<a href="#timereps">1</a>] <br><kbd><a name="H">HEADERFILE</a></kbd> [<a href="#HEADERFILE">1</a>] <br>Helper applications [<a href="#helpers">1</a>] <br>Hierarchical reports [<a href="#hierreps">1</a>] <br>Hits [<a href="#defns">1</a>] <br>Home page [<a href="#Readme">1</a>] <br><kbd>HOST</kbd> [<a href="#replist">1</a>] <br><kbd>HOST*</kbd> commands - see under second part of name <br>Host, definition [<a href="#defns">1</a>] <br>Host Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>] <br><kbd>HOSTNAME</kbd> [<a href="#HOSTNAME">1</a>] <br>Hostnames, numerical [<a href="#dns">1</a>] <br><kbd>HOSTREP*</kbd> commands - see under second part of name <br><kbd>HOSTURL</kbd> [<a href="#HOSTNAME">1</a>] <br><kbd>HOUR*</kbd> commands - see under second part of name <br><kbd>HOURLY</kbd> [<a href="#replist">1</a>] <br>Hourly Report [<a href="#reptime">1</a>][<a href="#replist">2</a>][<a href="#timereps">3</a>] <br>Hourly Summary [<a href="#reptimesum">1</a>][<a href="#replist">2</a>][<a href="#timereps">3</a>] <br>HTML output [<a href="#outstyle">1</a>] <br><kbd>HTMLPAGEWIDTH</kbd> [<a href="#PAGEWIDTH">1</a>] <br><kbd><a name="I">IMAGEDIR</a></kbd> [<a href="#IMAGEDIR">1</a>] <br><kbd>*INCLUDE</kbd> [<a href="#include">1</a>] <br>Inclusions and exclusions [<a href="#include">1</a>] <br>Introduction [<a href="#Readme">1</a>] <br>IP addresses [<a href="#dns">1</a>] <br><kbd><a name="L">LANGFILE</a></kbd> [<a href="#LANGUAGE">1</a>] <br><kbd>LANGUAGE</kbd> [<a href="#LANGUAGE">1</a>] <br>Languages [<a href="#LANGUAGE">1</a>][<a href="#domfile">2</a>] <br><kbd>LASTSEVEN</kbd> [<a href="#LASTSEVEN">1</a>] <br>Licence [<a href="Licence.txt">1</a>][<a href="#Readme">2</a>] <br><kbd>LINKEXCLUDE</kbd> [<a href="#LINKINCLUDE">1</a>] <br><kbd>LINKINCLUDE</kbd> [<a href="#LINKINCLUDE">1</a>] <br><kbd>LOGFILE</kbd> [<a href="#logfile">1</a>] <br>Logfile formats [<a href="#logfmt">1</a>][<a href="#logfile">2</a>] <br>Logfile prefix [<a href="#secondarg">1</a>] <br>Logfiles [<a href="#logfile">1</a>] <br>Logfiles, choosing [<a href="#logfile">1</a>] <br>Logfiles, compressed [<a href="#UNCOMPRESS">1</a>] <br>Logfiles, finding [<a href="#start">1</a>] <br><kbd>LOGFORMAT</kbd> [<a href="#logfmt">1</a>] <br><kbd>LOGO</kbd> [<a href="#LOGO">1</a>] <br><kbd>LOGTIMEOFFSET</kbd> [<a href="#TIMEOFFSET">1</a>] <br>Low memory [<a href="#lowmem">1</a>] <br><kbd>*LOWMEM</kbd> [<a href="#lowmem">1</a>][<a href="#cache">2</a>] <br><a name="M">Mailing lists</a> [<a href="#mailing">1</a>] <br>Makefile [<a href="#startux">1</a>] <br>Mandatory configuration file [<a href="#specialcfgs">1</a>] <br>Map [<a href="#map">1</a>] <br><kbd>MARKCHAR</kbd> [<a href="#MARKCHAR">1</a>] <br>Meaning of reports [<a href="#meaning">1</a>] <br>Memory, using less [<a href="#lowmem">1</a>] <br><kbd>MINGRAPHWIDTH</kbd> [<a href="#MINGRAPHWIDTH">1</a>] <br><kbd>MONTH*</kbd> commands - see under second part of name <br><kbd>MONTHLY</kbd> [<a href="#replist">1</a>] <br>Monthly Report [<a href="#reptime">1</a>][<a href="#replist">2</a>][<a href="#timereps">3</a>] <br><a name="N">Non-time reports</a> [<a href="#repoth">1</a>][<a href="#othreps">2</a>] <br><kbd>NOROBOTS</kbd> [<a href="#NOROBOTS">1</a>] <br>Numerical addresses [<a href="#dns">1</a>] <br>Numerical hostnames [<a href="#dns">1</a>] <br><a name="O">Operating System Report</a> [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#ARGSFLOOR">4</a>] <br><kbd>ORG*</kbd> commands - see under second part of name <br><kbd>ORGANISATION</kbd> [<a href="#replist">1</a>] <br>Organisations, definition [<a href="#domfile">1</a>] <br>Organisation Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#domfile">4</a>] <br>OS Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#ARGSFLOOR">4</a>] <br><kbd>OS*</kbd> commands - see under second part of name <br><kbd>OSREP</kbd> [<a href="#replist">1</a>] <br><kbd>OUTFILE</kbd> [<a href="#OUTFILE">1</a>] <br><kbd>OUTPUT</kbd> [<a href="#outstyle">1</a>] <br>Output aliases [<a href="#OUTPUTALIAS">1</a>] <br><kbd>OUTPUT COMPUTER</kbd> [<a href="#outstyle">1</a>][<a href="#compout">2</a>] <br>Output, configuring [<a href="#output">1</a>] <br>Output style, computer readable [<a href="#compout">1</a>] <br>Output styles [<a href="#outstyle">1</a>] <br><kbd>*OUTPUTALIAS</kbd> [<a href="#OUTPUTALIAS">1</a>] <br><a name="P">Page, definition</a> [<a href="#defns">1</a>] <br><kbd>PAGEEXCLUDE</kbd> [<a href="#PAGEINCLUDE">1</a>] <br><kbd>PAGEINCLUDE</kbd> [<a href="#PAGEINCLUDE">1</a>] <br>Pages, defining [<a href="#PAGEINCLUDE">1</a>] <br><kbd>*PAGEWIDTH</kbd> [<a href="#PAGEWIDTH">1</a>] <br>Path through site [<a href="#webworks">1</a>] <br>Processing Time Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>] <br><kbd>PROCTIME</kbd> [<a href="#replist">1</a>] <br><kbd>PROCTIME*</kbd> commands - see under second part of name <br><kbd>PROGRESSFREQ</kbd> [<a href="#PROGRESSFREQ">1</a>] <br><kbd><a name="Q">QUARTER</a></kbd> [<a href="#replist">1</a>] <br><kbd>QUARTER*</kbd> commands - see under second part of name <br>Quarter-Hour Report [<a href="#reptime">1</a>][<a href="#replist">2</a>][<a href="#timereps">3</a>] <br>Quick reference [<a href="#quickref">1</a>] <br><kbd><a name="R">RAWBYTES</a></kbd> [<a href="#RAWBYTES">1</a>] <br><kbd>REDIR</kbd> [<a href="#replist">1</a>] <br><kbd>REDIR*</kbd> commands - see under second part of name <br>Redirected Referrer Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br>Redirected requests, definition [<a href="#defns">1</a>] <br>Redirection Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br><kbd>REDIRREF</kbd> [<a href="#replist">1</a>] <br><kbd>REDIRREF*</kbd> commands - see under second part of name <br><kbd>REF*</kbd> commands - see under second part of name <br><kbd>REFARGSEXCLUDE</kbd> [<a href="#ARGSINCLUDE">1</a>] <br><kbd>REFARGSINCLUDE</kbd> [<a href="#ARGSINCLUDE">1</a>] <br><kbd>REFDIR</kbd> [<a href="#hierreps">1</a>] <br>Reference, quick [<a href="#quickref">1</a>] <br><kbd>REFERRER</kbd> [<a href="#replist">1</a>] <br>Referrer, definition [<a href="#defns">1</a>] <br>Referrer Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br>Referring Site Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br><kbd>REFLINKEXCLUDE</kbd> [<a href="#LINKINCLUDE">1</a>] <br><kbd>REFLINKINCLUDE</kbd> [<a href="#LINKINCLUDE">1</a>] <br><kbd>REFREP*</kbd> commands - see under second part of name <br><kbd>REFSITE</kbd> [<a href="#replist">1</a>] <br><kbd>REFSITE*</kbd> commands - see under second part of name <br>Regular expressions [<a href="#aliasregexp">1</a>][<a href="#incregexp">2</a>] <br><kbd>Report.html</kbd> [<a href="#startmac">1</a>][<a href="#startpc">2</a>][<a href="#startos2">3</a>] <br>Reporting bugs [<a href="#mailing">1</a>] <br><kbd>REPORTORDER</kbd> [<a href="#REPORTORDER">1</a>] <br>Reports, list of [<a href="#reports">1</a>][<a href="#replist">2</a>] <br><kbd>REPSEPCHAR</kbd> [<a href="#SEPCHAR">1</a>] <br><kbd>REQ*</kbd> commands - see under second part of name <br><kbd>REQUEST</kbd> [<a href="#replist">1</a>] <br>Request Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#hierreps">4</a>] <br>Requests, definition [<a href="#defns">1</a>] <br>Requests for pages, defining [<a href="#PAGEINCLUDE">1</a>] <br>Requests for pages, definition [<a href="#defns">1</a>] <br>Requests, types of [<a href="#defns">1</a>] <br>Robots, discouraging [<a href="#NOROBOTS">1</a>] <br><kbd>*ROWS</kbd> [<a href="#ROWS">1</a>] <br><kbd>RUNTIME</kbd> [<a href="#RUNTIME">1</a>] <br><a name="S">Sample reports</a> [<a href="#Readme">1</a>] <br>Search arguments [<a href="#args">1</a>][<a href="#ARGSFLOOR">2</a>] -- see also Search Query Report and Search Word Report below <br>Search Query Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#SEARCHENGINE">4</a>] <br>Search Word Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>][<a href="#SEARCHENGINE">4</a>] <br><kbd>SEARCHCHARCONVERT</kbd> [<a href="#SCC">1</a>] <br><kbd>SEARCHENGINE</kbd> [<a href="#SEARCHENGINE">1</a>] <br><kbd>SEARCHQUERY</kbd> [<a href="#replist">1</a>] <br><kbd>SEARCHQUERY*</kbd> commands - see under second part of name <br><kbd>SEARCHWORD</kbd> [<a href="#replist">1</a>] <br><kbd>SEARCHWORD*</kbd> commands - see under second part of name <br>Search engines, discouraging [<a href="#NOROBOTS">1</a>] <br><kbd>SEPCHAR</kbd> [<a href="#SEPCHAR">1</a>] <br><kbd>SETTINGS</kbd> [<a href="#settings">1</a>][<a href="#debug">2</a>] <br><kbd>SIZE</kbd> [<a href="#replist">1</a>] <br><kbd>SIZE*</kbd> commands - see under second part of name <br><kbd>*SORTBY</kbd> [<a href="#SORTBY">1</a>][<a href="#SUBSORTBY">2</a>][<a href="#ARGSSORTBY">3</a>] <br>Source code [<a href="#startux">1</a>] <br>Spiders, discouraging [<a href="#NOROBOTS">1</a>] <br>Starting to use analog [<a href="#start">1</a>] <br>Starting to use analog on a Mac [<a href="#startmac">1</a>] <br>Starting to use analog on OS/2 [<a href="#startos2">1</a>] <br>Starting to use analog on Windows [<a href="#startpc">1</a>] <br>Starting to use analog on other platforms [<a href="#startux">1</a>] <br><kbd>STATUS</kbd> [<a href="#replist">1</a>] <br>Status Code Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>] <br><kbd>STATUS*</kbd> commands - see under second part of name <br><kbd>STYLESHEET</kbd> [<a href="#STYLESHEET">1</a>] <br><kbd>SUBBROW</kbd> [<a href="#hierreps">1</a>] <br><kbd>SUBDIR</kbd> [<a href="#hierreps">1</a>] <br>Subdirectories [<a href="#hierreps">1</a>] <br><kbd>SUBDOMAIN</kbd> [<a href="#hierreps">1</a>] <br>Subdomains [<a href="#hierreps">1</a>] <br><kbd>SUB*FLOOR</kbd> [<a href="#SUBFLOOR">1</a>] <br><kbd>SUBORG</kbd> [<a href="#hierreps">1</a>] <br><kbd>SUB*SORTBY</kbd> [<a href="#SUBSORTBY">1</a>] <br><kbd>SUBTYPE</kbd> [<a href="#hierreps">1</a>] <br>Successful requests, definition [<a href="#defns">1</a>] <br>Syntax [<a href="#syntax">1</a>][<a href="#quickref">2</a>] <br><a name="T">Time reports</a> [<a href="#reptime">1</a>][<a href="#timereps">2</a>] <br><kbd>TIMECOLS</kbd> [<a href="#timeCOLS">1</a>] <br><kbd>TIMEOFFSET</kbd> [<a href="#TIMEOFFSET">1</a>] <br>Times, restricting [<a href="#FROMTO">1</a>] <br>Title line [<a href="#LOGO">1</a>][<a href="#HOSTNAME">2</a>] <br><kbd>TO</kbd> [<a href="#FROMTO">1</a>] <br>Total requests, definition [<a href="#defns">1</a>] <br>Translators [<a href="#acknow">1</a>] <br>Tree reports [<a href="#hierreps">1</a>] <br><kbd>TYPE*</kbd> commands - see under second part of name <br><kbd><a name="U">UNCOMPRESS</a></kbd> [<a href="#UNCOMPRESS">1</a>] <br>Unresolved numerical addresses [<a href="#dns">1</a>] <br>Unwanted logfile entries, definition [<a href="#defns">1</a>] <br>Upgrading from earlier versions [<a href="#update">1</a>] <br><kbd>USER</kbd> [<a href="#replist">1</a>] <br><kbd>USER*</kbd> commands - see under second part of name <br><kbd>USERCASE</kbd> [<a href="#CASE">1</a>] <br>User Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>] <br><kbd>USERREP*</kbd> commands - see under second part of name <br><kbd><a name="V">VHOST</a></kbd> [<a href="#replist">1</a>] <br><kbd>VHOST*</kbd> commands - see under second part of name <br><kbd>VHOSTREP*</kbd> commands - see under second part of name <br>Virtual domains/virtual hosts [<a href="#advfaq">1</a>][<a href="#secondarg">2</a>] <br>Virtual Host Report [<a href="#repoth">1</a>][<a href="#replist">2</a>][<a href="#othreps">3</a>] <br>Visitors [<a href="#webworks">1</a>] <br>Visits [<a href="#webworks">1</a>] <br><kbd><a name="W">WARNINGS</a></kbd> [<a href="#WARNINGS">1</a>] <br>Warnings [<a href="#WARNINGS">1</a>][<a href="#warns">2</a>] <br><kbd>WEEK*</kbd> commands - see under second part of name <br><kbd>WEEKBEGINSON</kbd> [<a href="#WEEKBEGINSON">1</a>] <br><kbd>WEEKLY</kbd> [<a href="#replist">1</a>] <br>Weekly Report [<a href="#reptime">1</a>][<a href="#replist">2</a>][<a href="#timereps">3</a>] <br>What was new? [<a href="#wasnew3">1</a>][<a href="#wasnew2">2</a>][<a href="#wasnew1">3</a>] <br>What's new? [<a href="#whatsnew">1</a>][<a href="#update">2</a>] <br><a name="Y">Year 2000 compatibility</a> [<a href="#startfaq">1</a>] <p> [ <a href="#A">A</a> | <a href="#B">B</a> | <a href="#C">C</a> | <a href="#D">D</a> | <a href="#E">E</a> | <a href="#F">F</a> | <a href="#G">G</a> | <a href="#H">H</a> | <a href="#I">I</a> | J | K | <a href="#L">L</a> | <a href="#M">M</a> | <a href="#N">N</a> | <a href="#O">O</a> | <a href="#P">P</a> | <a href="#Q">Q</a> | <a href="#R">R</a> | <a href="#S">S</a> | <a href="#T">T</a> | <a href="#U">U</a> | <a href="#V">V</a> | <a href="#W">W</a> | X | <a href="#Y">Y</a> | Z ] <hr> <address><a HREF="http://www.statslab.cam.ac.uk/~sret1/">Stephen Turner</a> <br>Need help with analog? <a href="#mailing">Subscribe to the analog-help mailing list</a> </address> </body></html>